1 – Meet the Careers Team

Hi, am Kathleen, I lead the career team at Udacity. We support students in their job search. The careers team knows what employers are looking for in a job candidate and we want to work with you to market yourself as the best person for the job, whether it’s a new role or a promotion … Read more

9 – 10 Dueling DQN V2

The final enhancement of DQNs that we will briefly look at is appropriately titled Dueling networks. Here is a typical DQN architecture. A sequence of convolutional layers followed by a couple of fully connected layers that produce Q values. The core idea of dueling networks is to use two streams, one that estimates the state … Read more

8 – 10 Prioritized Experience Replay V1

All right. The next issue we’ll look at is related to experience replay. Recall the basic idea behind it. We interact with the environment to collect experience tuples, save them in a buffer, and then later, we randomly sample a batch to learn from. This helps us break the correlation between consecutive experiences and stabilizes … Read more

7 – 10 Double DQN V2

The first problem we’re going to address, is the overestimation of action values that Q-learning is prone to. Let’s look back at the update rule for Q-learning with function approximation, and focus on the TD target. Here the max operation is necessary to find the best possible value we could get from the next state. … Read more

6 – Deep Q-Learning Algorithm

We’re now ready to take a look at the Deep Q-Learning Algorithm and implement it on our own. There are two main processes that are interleaved in this algorithm. One, is where we sample the environment by performing actions and store away the observed experienced tuples in a replay memory. The other is where we … Read more

5 – Fixed Q-Targets

Experience replay helps us address one type of correlation. That is between consecutive experience tuples. There is another kind of correlation that Q-learning is susceptible to. Q-learning is a form of Temporal Difference or TD learning, right? Here, R plus gamma times the maximum possible value from the next state is called the TD target. … Read more

4 – Experience Replay

The idea of experience replay and its application to training neural networks for reinforcement learning isn’t new. It was originally proposed to make more efficient use of observed experiences. Consider the basic online Q-learning algorithm where we interact with the environment and at each time step, we obtain a state action reward next state tuple. … Read more

3 – Deep Q-Networks

In 2015, Deep Mind made a breakthrough by designing an agent that learned to play video games better than humans. Yes, it’s probably easy to write a program that plays pong perfectly if you have access to the underlying game state, position of the ball, paddles, et cetera. But this agent was only given raw … Read more

2 – DQN Overview

The Deep Q-Network algorithm has caused a lot of buzz around Deep RL since 2013. It’s more or less an online version of a neural fitted value iteration paper from 2005 by Reed Miller and Martin. Which introduced training of acute value function represented by a multilayer perceptron. There are few very useful additions and … Read more

10 – Summary

I hope you got a good sense of deep Q learning, how it combines the best of reinforcement learning with recent advances in deep learning. I also hope that you’ll realize how this opens up a world of possibilities for you to experiment with different neural net architectures value functions, learning algorithms and not to … Read more

1 – From RL to Deep RL

Reinforcement Learning is a branch of Machine Learning, where an agent outputs an action and the environment returns an observation or, the state of the system and a reward. The goal of an agent is to best determine the best action to take. Usually, RL is described in terms of this agent interacting with the … Read more

1 – Deep RL in Robotics

In 2015, Google’s DeepMind AlphaGo made big news in the AI world using deep reinforcement learning. AlphaGo beat a human professional in the very complex game of Go. We’ve seen computers win against before those. So, what was so special this time? During the game, AlphaGo used original moves that it had come up with … Read more