Reinforcement Learning is a branch of Machine Learning, where an agent outputs an action and the environment returns an observation or, the state of the system and a reward. The goal of an agent is to best determine the best action to take. Usually, RL is described in terms of this agent interacting with the previously unknown environment, trying to maximize the overall or total reward. Now then, what is Deep RL? Well, in some sense, it is using nonlinear function approximators to calculate the value actions based directly on observation from the environment. We represented it as a Deep Neural Network. We then use Deep Learning to find the optimal parameters for these function approximators. You have already worked with some Deep Learning Neural Networks for classification, detection, and semantic segmentation. However, these Deep Learning Applications use label training data for supervised learning. The inference engine then produces the best guess label, not an action, as the output. When an oral agent handles the entire end-to-end pipeline, it’s called pixels-to-action, referring to the networks ability to take raw sensor data and choose the action, it thinks will best maximize its reward. Over time, oral agents have a uncanny knack for developing intuitive human-like behaviors like learning to walk or peeking behind corners when they’re unsure. They naturally incorporate elements of exploration and knowledge gathering, which makes them good for imitating behaviors and performing path planning. Robots operating in unstructured environments tend to greatly benefit from oral agents. Which gives them a way to make sense of the environment, which can be hard to model in advance.