Summary

The Setting, Revisited
- The reinforcement learning (RL) framework is characterized by an agent learning to interact with its environment.
- At each time step, the agent receives the environment’s state (the environment presents a situation to the agent), and the agent must choose an appropriate action in response. One time step later, the agent receives a reward (the environment indicates whether the agent has responded appropriately to the state) and a new state.
- All agents have the goal to maximize expected cumulative reward, or the expected sum of rewards attained over all time steps.
Episodic vs. Continuing Tasks
- A task is an instance of the reinforcement learning (RL) problem.
- Continuing tasks are tasks that continue forever, without end.
- Episodic tasks are tasks with a well-defined starting and ending point.
- In this case, we refer to a complete sequence of interaction, from start to finish, as an episode.
- Episodic tasks come to an end whenever the agent reaches a terminal state.
The Reward Hypothesis
- Reward Hypothesis: All goals can be framed as the maximization of (expected) cumulative reward.
Goals and Rewards
- (Please see Part 1 and Part 2 to review an example of how to specify the reward signal in a real-world problem.)
Cumulative Reward
- The return at time step tt is $G_t := R_{t+1} + R_{t+2} + R_{t+3} + \ldots$
- The agent selects actions with the goal of maximizing expected (discounted) return. (Note: discounting is covered in the next concept.)

댓글을 달려면 로그인해야 합니다.