3 – Episodic vs. Continuing Tasks

In this course, many of the real world situations we’ll consider will have a well-defined ending point. For instance, say we’re teaching an agent to play a game. Then, the interaction ends when the agent wins or loses. Or we might be running a simulation to teach a car to drive. Then, the interaction ends if the car crashes. Of course, not all reinforcement learning tasks have a well-defined ending point but those that do are called episodic tasks. And in this case, we’ll refer to a complete sequence of interaction from start to finish as an episode. When the episode ends, the agent looks at the total amount of reward it received to figure out how well it did. It’s then able to start from scratch as if it has been completely reborn into the same environment but now with the added knowledge of what happened in its past life. In this way, as time passes over its many lives, the agent makes better and better decisions and you’ll see this for yourself in your coding implementations. Once your agents have spent enough time getting to know the environment, they should be able to pick a strategy where the cumulative reward is quite high. In other words, in the context of a game playing agent, it should be able to achieve a higher score. So episodic tasks are tasks with a well-defined ending point. We’ll also look at tasks that go on forever, without end. And those are called continuing tasks. For instance, an algorithm that buys and sells stocks in response to the financial market would be best modeled as an agent in the continuing tasks. In this case, the agent lives forever. So it has to learn the best way to choose actions while simultaneously interacting with the environment. The algorithms for this case are slightly more complex and will be covered a bit later in the course. But for now, let’s dig deeper into this idea of reward.

%d 블로거가 이것을 좋아합니다: