3 – The Setting

Throughout this course, we’ll concern ourselves with the idea of learning from interaction. In the field of reinforcement learning, we refer to the learner or decision maker as the agent. But I really like dogs, so we’ll instead, think of the agent as a small puppy born into the world without any understanding of how anything works. And when he opens his eyes, the first thing he observes is his owner. Now, say the owner communicates to the puppy how she would like it to behave. The puppy looks at the command and based on that observation, he’s expected to choose how to respond. Of course, he hasn’t invested interest in responding appropriately, mostly to stay out of trouble but maybe to even get a treat. Now, newborn puppies are complicated creatures and it’s impossible to count the number of actions that he could take at any point in time. Furthermore, he doesn’t know what any of the actions do yet or what effect will have on the world. So we’ll have to try them out and see what happens. So how will he respond to his owners command? At this point, he has no reason to favor any action over the others. And so, let’s say he just picks one at random. Of course, with full understanding that he has no idea what he’s doing. In this case, he decides to sit. After taking this action, he waits for a response. And I have to imagine the wait is pretty scary as it’s entirely uncertain what will happen next. In response to his action, he receives feedback from his owner. In this case, he receives a single treat. That’s encouraging and he’s off to a great start. We’ll assume that in general, his only goal is to maximize reward. He’d like it to be as positive as possible. That holds for now and at every point in the future. But now the owner has given a new command and it’s time for him to choose his next response. What do you think? Sitting seemed to perform well. But now, the command looks different. So maybe a different action is more appropriate. He decides to again pick an action randomly. In this case, he decides to run and waits for a response. And, oh no, he received discouraging feedback from his owner. Now, it’s unclear if this is just a bad action in general or if he should just not run in response to this particular observation. He can only discover this through interacting more with his owner through systematically proposing and testing hypotheses. And so the process continues where at each point in time, the puppy takes an action and then simultaneously receives feedback and an updated observation from his owner. At each point in time, he does his best to execute the appropriate action to make his owner happy. Now at first glance, it could seem that when we view interaction along these lines, everything is pretty simple. After all, the way the puppy interacts with his owner is quite systematic. And while it may take him some time to get an idea of what’s happening, he should be able to figure it out eventually. But as you progress through this course, you’ll see that this puppy situation is surprisingly complex. For instance, he’ll have to find the balance with on the one hand exploring potential hypotheses for how to choose actions. And on the other, exploiting his limited knowledge about what he already knows should work well. That is, should he settle for just one treat or should he aim higher? This is a very important concept in reinforcement learning. While there are some widely adopted techniques for balancing these two competing requirements, this question has inspired an entire subfield of reinforcement learning, and remains an area of active research. Another thing that’s important to note is that if our puppy is truly a reinforcement learning agent, he’s not just concerned with the reward he can get now. Instead, his goal is to maximize the total number of treats he can get over his entire lifetime. So while it might be relatively simple to derive a strategy that works well in the short term, he’ll have to be a bit more clever if he wants to find strategies that take longer to pay off. In general, this idea of learning from experience is intellectually quite interesting. For instance, if you’ve ever tried to train a dog perhaps you’ve noticed that it’s not so easy. His first instinct might be whenever I sit I receive a treat rather than whenever my owner says sit and then I sit, I receive a treat. Even though it seems like a reasonable mistake to make, we guarantee you that your agents will know better.

%d 블로거가 이것을 좋아합니다: