Welcome to Deep Reinforcement Learning. Interest in the field of reinforcement learning seems to have almost exploded with success stories like AlphaGo and platforms like OpenAI. Research in this area has been moving at a steady pace since the 1980s, but it has really taken off with recent advances in deep learning. As we progress through this module, you will design intelligent agents that can learn to carry out complex control tasks. These include simple domains like physics problems and board games to video games where the agent processes raw pixel data and even robotics. My favorite part of reinforcement learning is watching an agent grow and get better and better at a task. This is not always easy to achieve but once you can get your agent to learn the nuances of a task, it can perform it flawlessly from then on. And that is the most rewarding experience in the world. Before we dive into “Deep Reinforcement Learning”, let’s quickly review some fundamental concepts. Reinforcement learning problems are typically framed as Markov Decision Processes or MDPs. An MDP consists of a set of states S and actions A along with probabilities P, rewards R and a discount factor gamma. P captures how frequently different transitions and rewards occur, often modeled as a single joint probability where the state and reward at any time step t plus one depend only on the state and action taken at the previous time step t. This characteristic of certain environments is known as the Markov property. There are two quantities that we are typically interested in. The value of a state V(S), which we try to estimate or predict. And the value of an action taken in a certain state, Q(S, A) which can help us decide what action to take. These two mappings or functions are very much interrelated and help us find an optimal policy for our problem pie star that maximizes the total reward received. Note that since MDPs are probabilistic in nature, we can’t predict with complete certainty what future rewards we will get and for how long. So, we typically aim for total expected reward. This is where the discount factor gamma comes into play as well. It is used to assign a lower weighted to future rewards when computing state and action values. Reinforcement learning algorithms are generally classified into two groups. Model-based approaches such as policy iteration and value iteration require a known transition and reward model. They essentially apply dynamic programming to iteratively compute the desired value functions and optimal policies using that model. On the other hand, model-free approaches including Monte Carlo Methods and Temporal-Difference Learning don’t require an explicit model. They sample the environment by carrying out exploratory actions and use the experience gained to directly estimate value functions. Okay, obviously there is more to it but that’s reinforcement learning in a nutshell. Deep Reinforcement Learning is a relatively recent term that refers to approaches that use deep learning, mainly, Multi-Layer Neural Networks to solve reinforcement learning problems. Now, reinforcement learning is typically characterized by finite MDPs, where the number of states and actions is limited. But there are so many problems where the space of states and actions is very large or even made of continuous real value numbers. Traditional algorithms use a table or a dictionary or other finite structure to capture state and action values. They no longer work for such problems. So, the first thing you will learn in this module is how to generalize these algorithms to work with large and continuous spaces. That lays the foundation for developing Deep Reinforcement Learning algorithms including value-based techniques like Deep Q-Learning and those that directly try to optimize the policy, such as Policy Gradients. Finally, you will look at more advanced approaches that try to combine the best of both worlds, Actor-Critic Methods. These algorithms can be hard to understand. So, don’t worry if you find them challenging at first. Make sure you practice implementing the core competence of these algorithms and apply them to various environments. Observe how they perform. That is the only way to master Deep Reinforcement Learning.