## 5 – M3L3 C05 V2

As you’ve learned, we can express the expected return as a probability weighted sum, where we take into account the probability of each possible trajectory and, the return permits trajectory. Our goal is to find the value of theta that maximizes expected return. One way to do that is by Gradient Ascent, where we just … Read more

## 4 – M3L3 C04 V2

Now that we have the big picture of how the policy gradient method will work, we’re ready to get more specific. We’ll build slowly and carefully, and I strongly encourage you to keep the big picture in mind as the mathematical details unfold over the next several videos. The first thing we need to define … Read more

## 3 – M3L3 C03 V2

Before moving on, let’s talk a little bit more about what we just did and how it’s related to supervised learning. As we discussed in the previous video, we begin by playing the game for an episode. If we make it to the other end of the street safely and in time, then we win … Read more

## 2 – M3L3 C02 V6

By the end of the last video, we had discussed a game that we’d like to teach an agent to play. There were four possible actions corresponding to moving up, down, left, or right. The output layer of our neural network had a node for each possible action. The weights begin with initially random values … Read more

## 1 – M3L3 C01 V3

Hello and welcome to this lesson on policy gradient methods. In the previous lesson, you learned all about policy-based method. Remember, policy-based methods are a class of algorithms that search directly for the optimal policy without simultaneously maintaining value function estimates. You learned how to represent the policy as a neural network, and in that … Read more