## 1-6-19. Summary

Summary The Setting, Revisited The reinforcement learning (RL) framework is characterized by an agent learning to interact with its environment. At each time step, the agent receives the environment’s state (the environment presents a situation to the agent), and the agent must choose an appropriate action in response. One time step later, the agent receives a reward (the environment indicates whether the agent has … Read more

## 1-6-18. Finite MDPs

Finite MDPs Please use this link to peruse the available environments in OpenAI Gym. The environments are indexed by Environment Id, and each environment has corresponding Observation Space, Action Space, Reward Range, tStepL, Trials, and rThresh. CartPole-v0 Find the line in the table that corresponds to the CartPole-v0 environment. Take note of the corresponding Observation Space (Box(4,)) and Action Space (Discrete(2)). Observation Space The observation space for the CartPole-v0 environment … Read more

## Quiz: One-Step Dynamics, Part 2

Quiz: One-Step Dynamics It will prove convenient to represent the environment’s dynamics using mathematical notation. In this concept, we will introduce this notation (which can be used for any reinforcement learning task) and use the recycling robot as an example.

## 1-6-15. Quiz: One-Step Dynamics, Part 1

Quiz: One-Step Dynamics Consider the recycling robot example. In the previous concept, we described one method that the environment could use to decide the state and reward, at any time step. Say at an arbitrary time step $t$, the state of the robot’s battery is high ($S_t = \text{high}$). Then, in response, the agent decides to … Read more

## 1-6-13. MDPs, Part 1

MDPs Over the next several videos, you’ll learn all about how to rigorously define a reinforcement learning problem as a Markov Decision Process (MDP). Towards this goal, we’ll begin with an example! Notes In general, the state space $\mathcal{S}$ is the set of all nonterminal states. In continuing tasks (like the recycling task detailed in the video), this is … Read more

## 1-6-12. Quiz: Pole-Balancing

In this classic reinforcement learning task, a cart is positioned on a frictionless track, and a pole is attached to the top of the cart. The objective is to keep the pole from falling over by moving the cart either left or right, and without falling off the track. In the OpenAI Gym implementation, the agent … Read more

## 1-6-9. Quiz: Goals and Rewards

Quiz: Goals and Rewards So far, you’ve seen one example for how to frame an agent’s goal as the maximization of expected cumulative reward. In this quiz, you will investigate several more examples.

## 1-6-8. Goals and Rewards, Part 2

If you’d like to learn more about the research that was done at DeepMind, please check out this link. The research paper can be accessed here. Also, check out this cool video!

## 1-6-5. Quiz: Episodic or Continuing?

Quiz: Episodic or Continuing? Remember: A task is an instance of the reinforcement learning (RL) problem. Continuing tasks are tasks that continue forever, without end. Episodic tasks are tasks with a well-defined starting and ending point. In this case, we refer to a complete sequence of interaction, from start to finish, as an episode. Episodic tasks come to an end … Read more

## 1-6-4. Quiz: Test Your Intuition

Quiz: Test Your Intuition Playing Chess Say you are an agent, and your goal is to play chess. At every time step, you choose any action from the set of possible moves in the game. Your opponent is part of the environment; she responds with her own move, and the state you receive at the next time step is … Read more

## 9 – MDPs, Part 1

So far, you’ve just started a conversation to set the stage for what we’d like to accomplish. We’ll use the remainder of this lesson to specify a rigorous definition for the reinforcement learning problem. For context, we’ll work with the example of a recycling robot from the Sutton textbook. So consider a robot that’s designed … Read more