8 – Value Iteration

So we have talked about policy iteration. We have also learned about truncated policy iteration. In this case, the policy evaluation step is permitted only a limited number of sweeps through the state space. In other words, we limit the number of times that the estimated value of each state is updated before proceeding to … Read more

7 – Truncated Policy Iteration

Congratulations! You’ve implemented your first algorithm that can solve an MDP. For the remainder of this lesson, we’ll look at some variations of this algorithm, and you’ll have the chance to implement all of them to compare their performance. We’ll begin by looking at this policy evaluation step. Remember that policy evaluation is an iterative … Read more

6 – Policy Iteration

At this point in the lesson, you’ve used policy evaluation to determine how good a policy is by calculating its value function. You’ve also used policy improvement which uses the value function for a policy to construct a new policy that’s better than or equal to the current one. I mentioned that it will make … Read more

5 – Policy Improvement

I hope you enjoyed implementing iterative policy evaluation in the first part of the mini project. Feel free to use your algorithm to evaluate a policy for any finite MDP of your choosing. You need not confine yourself to the Frozen Lake environment. Just remember that policy evaluation requires the agent to have full knowledge … Read more

3 – An Iterative Method

Let’s build off the grid world example and investigate how we might determine the value function corresponding to a particular policy. To begin, we’ll enumerate the states. So, state S1 is the state in the top left corner, then S2, S3, and S4. Say, we’re trying to evaluate the Stochastic Policy where the agent selects … Read more

2 – Another Gridworld Example

Let’s begin with a very small world and an agent who lives in it. The world is primarily composed of nice patches of grass, but one of the four locations in the world has a large mountain. We can think of each of these four possible locations in the world as states in the environment. … Read more

1 – Introduction

For this lesson, we’ll confine our attention to a problem that’s slightly easier than the reinforcement learning problem. Instead of working in a setting where the agent has to learn from interaction, we’ll assume that the agent already knows everything about the environment. So the agent knows how the environment decides the next state, and … Read more