Finite MDPs
Please use this link to peruse the available environments in OpenAI Gym.
Find the line in the table that corresponds to the CartPole-v0 environment. Take note of the corresponding Observation Space (Box(4,)
) and Action Space (Discrete(2)
).
The observation space for the CartPole-v0 environment has type Box(4,)
. Thus, the observation (or state) at each time point is an array of 4 numbers. You can look up what each of these numbers represents in this document. After opening the page, scroll down to the description of the observation space.
The action space for the CartPole-v0 environment has type Discrete(2)
. Thus, at any time point, there are only two actions available to the agent. You can look up what each of these numbers represents in this document (note that it is the same document you used to look up the observation space!). After opening the page, scroll down to the description of the action space.