## 1-7-7. Quiz: State-Value Functions

Quiz: State-Value Functions In this quiz, you will calculate the value function corresponding to a particular policy. Each of the nine states in the MDP is labeled as one of $\mathcal{S}^+ = \{s_1, s_2, \ldots, s_9 \}$, where $s_9$​ is a terminal state. Consider the (deterministic) policy that is indicated (in orange) in the figure below.

## 1-7-6. Bellman Equations

Bellman Equations In this gridworld example, once the agent selects an action, it always moves in the chosen direction (contrasting general MDPs where the agent doesn’t always have complete control over what the next state will be), and the reward can be predicted with complete certainty (contrasting general MDPs where the reward is a random … Read more

## 1-7-3. Quiz: Interpret the Policy

Quiz: Interpret the Policy A policy determines how an agent chooses an action in response to the current state. In other words, it specifies how the agent responds to situations that the environment has presented. Consider the recycling robot MDP from the previous lesson.

## 1-7-1. Introduction

You’ve already learned about how to formulate a real-world problem so it can be solved with reinforcement learning. In this lesson, you’ll begin to think about ways to solve this problem. It’s important to note that this lesson is significantly more technical than the previous one. If this is initially uncomfortable to you, don’t worry, … Read more

## 8 – Optimal Policies

Several concepts ago, I’ve mentioned that we needed to define the action value function before talking about how the agent could search for an optimal policy, and we will see most of the detail for a later lesson. The main idea is this. The agent interacts with the environment. And from that interaction, it estimates … Read more

## 7 – Action-Value Functions

So far we’ve been working with the state value function for a policy. For each state s, it yields the expected discounted return If the agent starts in state as and then uses the policy to choose its actions for all time steps. You’ve seen a few examples and know how to calculate the state … Read more

## 6 – Optimality

So far in this lesson, we’ve looked at a particular policy Pi, and calculated its corresponding value function. In the quiz, you calculated the value function corresponding to a different policy which we denoted by Pi-Prime. And if you look at each of these value functions, you may notice a pattern or trend. Take the … Read more

## 5 – Bellman Equations

If you take the time yourself to calculate the value function for this policy, you might notice that you don’t need to start your calculations from scratch every time. In particular, you don’t need to look at the first state then add up all the rewards along the way. Then look at the second state, … Read more

## 4 – State-Value Functions

So we’re working with this grid world example and looking for the best policy that leads us to a goal state as quickly as possible. So, let’s start with a very, very bad policy so that we can understand why it’s bad, and then work to improve it. Specifically, we’ll look at a policy where … Read more

## 3 – Gridworld Example

To understand how to go about searching for the best policy, it will help to have a running example. So consider this very very small world and an agent who lives in it. Say the world is primarily composed of nice patches of grass, but two out of the nine locations in the world have … Read more

## 2 – Policies

We’ve seen that we use a Markov decision process or MDP as a formal definition of the problem that we’d like to solve with reinforcement learning. In this video, we specify a formal definition for the solution to this problem. We can start to think of the solution as a series of actions that need … Read more