1-6-19. Summary

Summary The Setting, Revisited The reinforcement learning (RL) framework is characterized by an agent learning to interact with its environment. At each time step, the agent receives the environment’s state (the environment presents a situation to the agent), and the agent must choose an appropriate action in response. One time step later, the agent receives a reward (the environment indicates whether the agent has … Read more

1-6-18. Finite MDPs

Finite MDPs Please use this link to peruse the available environments in OpenAI Gym. The environments are indexed by Environment Id, and each environment has corresponding Observation Space, Action Space, Reward Range, tStepL, Trials, and rThresh. CartPole-v0 Find the line in the table that corresponds to the CartPole-v0 environment. Take note of the corresponding Observation Space (Box(4,)) and Action Space (Discrete(2)). Observation Space The observation space for the CartPole-v0 environment … Read more

1-6-13. MDPs, Part 1

MDPs Over the next several videos, you’ll learn all about how to rigorously define a reinforcement learning problem as a Markov Decision Process (MDP). Towards this goal, we’ll begin with an example! Notes In general, the state space $\mathcal{S}$ is the set of all nonterminal states. In continuing tasks (like the recycling task detailed in the video), this is … Read more

1-6-5. Quiz: Episodic or Continuing?

Quiz: Episodic or Continuing? Remember: A task is an instance of the reinforcement learning (RL) problem. Continuing tasks are tasks that continue forever, without end. Episodic tasks are tasks with a well-defined starting and ending point. In this case, we refer to a complete sequence of interaction, from start to finish, as an episode. Episodic tasks come to an end … Read more

9 – MDPs, Part 1

So far, you’ve just started a conversation to set the stage for what we’d like to accomplish. We’ll use the remainder of this lesson to specify a rigorous definition for the reinforcement learning problem. For context, we’ll work with the example of a recycling robot from the Sutton textbook. So consider a robot that’s designed … Read more