3 – Quiz_ MC Control Methods

In this lesson, we’ll draft several new algorithms to solve the reinforcement learning problem. We’ll begin by reviewing how Monte Carlo Control works using our small grid world example. Remember that we keep track of a Q-table, it contains for each state action pair the return that we expect to get. To update the Q-table we sample a complete episode. For instance, this episode contains information that we can use to update the values of three separate state action pairs in the table. But for now, let’s focus our attention on the first state action pair. Remember that this is the equation that we can use to update the Q-table. Capital G is just the return that was collected by the agent, and alpha is some small positive number that you set. In this case, the current value in the Q-table is six. In other words, before we collected the episode, we would expect that choosing action right in state one would yield a return of six. But then, after collecting the episode, we see that we instead got a return of eight. So, then what Monte Carlo Control does is it updates the Q-table to push this value of six a little bit closer to eight. You can think of this equation as comparing the return that we expected to the return that we actually experienced. Then, if those numbers don’t agree, we change the Q-table just a little so that our expectations line up better with the reality. The new value on the table will be something between six and eight. For smaller values of Alpha, it’s closer to six, and larger values of Alpha will push us closer to eight. But now, to make sure we’re on the same page, you’ll plug in numbers to calculate this update in the quiz question below. Then, we’ll be ready to introduce some new algorithms.