7 – TD Control_ Expected Sarsa

So far, you’ve implemented Sarsa and Sarsamax and we’lll now discuss one more option. This new option is called expected Sarsa and it closely resembles Sarsamax, where the only difference is in the update step for the action value. Remember that Sarsamax or Q learning took the maximum over all actions of all possible next … Read more

6 – TD Control_ Sarsamax

So far, you already have one algorithm for temporal difference control. Remember that in the Sarsa algorithm, we begin by initializing all action values to zero in constructing the corresponding Epsilon Greedy policy. Then, the agent begins interacting with the environment and receives the first state. Next, it uses the policy to choose it’s action. … Read more

5 – TD Control Sarsa Part 2

We began this lesson by reviewing Monte Carlo Control. Remember this was the corresponding update equation. In order to use it, we sample a complete episode. Then, we look up the current estimate and the Q table, and compare it to the return that we actually experienced after visiting the state action pair. We use … Read more

4 – TD Control Sarsa Part 1

In this video, we’ll discuss an algorithm that doesn’t need us to complete an entire episode before updating the Q-Table. Instead, we’ll update the Q-Table at the same time as the episode is unfolding. In particular, we’ll only need this very small time window of information to do an update, and so here’s the idea. … Read more

3 – Quiz_ MC Control Methods

In this lesson, we’ll draft several new algorithms to solve the reinforcement learning problem. We’ll begin by reviewing how Monte Carlo Control works using our small grid world example. Remember that we keep track of a Q-table, it contains for each state action pair the return that we expect to get. To update the Q-table … Read more

2 – L602 Gridworld Example RENDER V2-2

To illustrate the algorithms we’ll discuss in this lesson, it’ll help to work with a small example of a reinforcement learning task. So, say we have an agent in a world with only four possible states, here, marked by stone, brick, wood, or a grass. Say that at the beginning of an episode, the agent … Read more

1 – Introduction

In this lesson, you will learn about Temporal Difference or TD learning. In order to understand TD learning, it will help to discuss what exactly it would mean to solve this problem of learning from interaction. The solution will come many years into the future, when we’ve developed artificially intelligent agents that interact with the … Read more