11 – MDPs, Part 3

Now that we’ve looked at an example, you should have the necessary intuition to understand the formal definition of the reinforcement learning framework. So, formally, a Markov decision process or MDP is defined by the set of states, the set of actions, and the set of rewards along with the one-step dynamics of the environment and the discount rate. We’ve detail the states actions, rewards, and one-step dynamics of the environment, but we will also need to talk about the discount rate. And towards this end, it is important to notice that we’ve detailed a continuing task. So, it will prove useful to make the discount factor less than one because otherwise, the agent would have to look infinitely far into the limitless future. It’s common to set the discount rate to 0.9 And that feels like a good choice here. Throughout this course, you’ll have the opportunity and your implementations to build some intuition for how to set the discount rate. But it’s important to note now that the discount rate is always set to some number much closer to one than to zero. Otherwise, the agent becomes excessively short-sighted to a fault. And now, you have fully specified your first MDP. In general, when you have a real world problem in mind, you will need to specify the MDP and that will fully and formally define the problem that you want to your agent to solve. This framework works for continuing and episodic tasks and whenever you have a problem that you want to solve with reinforcement learning, whether it entails a self-driving car, a walking robot, or a stock trading agent, this is the framework that you will use. The agent will know the states and actions along with the discount factor. As for the set up rewards and the one-step dynamics, those specify how the environment works and will be unknown to the agent. Despite not having this information, the agent will still have to learn from interaction how to accomplish its goal.

%d 블로거가 이것을 좋아합니다: