1-6-19. Summary


The Setting, Revisited

  • The reinforcement learning (RL) framework is characterized by an agent learning to interact with its environment.
  • At each time step, the agent receives the environment’s state (the environment presents a situation to the agent), and the agent must choose an appropriate action in response. One time step later, the agent receives a reward (the environment indicates whether the agent has responded appropriately to the state) and a new state.
  • All agents have the goal to maximize expected cumulative reward, or the expected sum of rewards attained over all time steps.

Episodic vs. Continuing Tasks

  • task is an instance of the reinforcement learning (RL) problem.
  • Continuing tasks are tasks that continue forever, without end.
  • Episodic tasks are tasks with a well-defined starting and ending point.
    • In this case, we refer to a complete sequence of interaction, from start to finish, as an episode.
    • Episodic tasks come to an end whenever the agent reaches a terminal state.

The Reward Hypothesis

  • Reward Hypothesis: All goals can be framed as the maximization of (expected) cumulative reward.

Goals and Rewards

  • (Please see Part 1 and Part 2 to review an example of how to specify the reward signal in a real-world problem.)

Cumulative Reward

  • The return at time step tt is $G_t := R_{t+1} + R_{t+2} + R_{t+3} + \ldots$
  • The agent selects actions with the goal of maximizing expected (discounted) return. (Note: discounting is covered in the next concept.)

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.

%d 블로거가 이것을 좋아합니다: