1-6-13. MDPs, Part 1


Over the next several videos, you’ll learn all about how to rigorously define a reinforcement learning problem as a Markov Decision Process (MDP).

Towards this goal, we’ll begin with an example!


In general, the state space $\mathcal{S}$ is the set of all nonterminal states.

In continuing tasks (like the recycling task detailed in the video), this is equivalent to the set of all states.

In episodic tasks, we use $\mathcal{S}^+$ to refer to the set of all states, including terminal states.

The action space $\mathcal{A}$ is the set of possible actions available to the agent.

In the event that there are some states where only a subset of the actions are available, we can also use $\mathcal{A}(s)$ to refer to the set of actions available in state $s\in\mathcal{S}$.

댓글 남기기

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.

%d 블로거가 이것을 좋아합니다: