# 1-6-19. Summary

Summary

### The Setting, Revisited

• The reinforcement learning (RL) framework is characterized by an agent learning to interact with its environment.
• At each time step, the agent receives the environment’s state (the environment presents a situation to the agent), and the agent must choose an appropriate action in response. One time step later, the agent receives a reward (the environment indicates whether the agent has responded appropriately to the state) and a new state.
• All agents have the goal to maximize expected cumulative reward, or the expected sum of rewards attained over all time steps.

• task is an instance of the reinforcement learning (RL) problem.
• Episodic tasks are tasks with a well-defined starting and ending point.
• In this case, we refer to a complete sequence of interaction, from start to finish, as an episode.
• Episodic tasks come to an end whenever the agent reaches a terminal state.

### The Reward Hypothesis

• Reward Hypothesis: All goals can be framed as the maximization of (expected) cumulative reward.

### Goals and Rewards

• (Please see Part 1 and Part 2 to review an example of how to specify the reward signal in a real-world problem.)

### Cumulative Reward

• The return at time step tt is $G_t := R_{t+1} + R_{t+2} + R_{t+3} + \ldots$
• The agent selects actions with the goal of maximizing expected (discounted) return. (Note: discounting is covered in the next concept.)

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.