1-6-15. Quiz: One-Step Dynamics, Part 1

Quiz: One-Step Dynamics

Consider the recycling robot example. In the previous concept, we described one method that the environment could use to decide the state and reward, at any time step.

Say at an arbitrary time step $t$, the state of the robot’s battery is high ($S_t = \text{high}$). Then, in response, the agent decides to search ($A_t = \text{search}$). You learned in the previous concept that in this case, the environment responds to the agent by flipping a theoretical coin with 70% probability of landing heads.

  • If the coin lands heads, the environment decides that the next state is high ($S_{t+1} = \text{high}$), and the reward is 4 ($R_{t+1} = 4$).
  • If the coin lands tails, the environment decides that the next state is low ($S_{t+1} = \text{low}$), and the reward is 4 ($R_{t+1} = 4$).

This is depicted in the figure below.

In fact, for any state $S_{t}$​ and action $A_{t}$​, it is possible to use the figure to determine exactly how the agent will decide the next state $S_{t+1}$​ and reward $R_{t+1}$.

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.

%d 블로거가 이것을 좋아합니다: