Finite MDPs
Please use this link to peruse the available environments in OpenAI Gym.

The environments are indexed by Environment Id, and each environment has corresponding Observation Space, Action Space, Reward Range, tStepL, Trials, and rThresh.
CartPole-v0
Find the line in the table that corresponds to the CartPole-v0 environment. Take note of the corresponding Observation Space (Box(4,)
) and Action Space (Discrete(2)
).

Observation Space
The observation space for the CartPole-v0 environment has type Box(4,)
. Thus, the observation (or state) at each time point is an array of 4 numbers. You can look up what each of these numbers represents in this document. After opening the page, scroll down to the description of the observation space.

Action Space
The action space for the CartPole-v0 environment has type Discrete(2)
. Thus, at any time point, there are only two actions available to the agent. You can look up what each of these numbers represents in this document (note that it is the same document you used to look up the observation space!). After opening the page, scroll down to the description of the action space.


댓글을 달려면 로그인해야 합니다.