5 – Goals and Rewards, Part 1

So, I’d like to talk to you about some research that I find particularly interesting. And I think it’s a great example to illustrate the reward hypothesis that was introduced in the previous video. Google DeepMind recently addressed the problem of teaching a robot to walk. Among other problem domains, they worked with a physical simulation of a humanoid robot and they managed to apply some nice reinforcement learning to get great results. As you learned in an earlier video, in order to frame this as a reinforcement learning problem, we’ll have to specify the state’s actions and rewards. We’ll dedicate two videos to this example and we’ll begin by detailing the actions. These are the decisions that need to be made in order for the robot to walk. Now, the humanoid has several joints, and the actions are just the forces that the robot applies to its joints in order to move. Because the robot has an intelligent method for deciding these forces at every point in time, that will be sufficient to get it walking. And what about the states? The states are the context provided to the agent for choosing intelligent actions. In this context, the state at any point in time contain the current positions and velocities of all of the joints, along with some measurements about the surface that the robot was standing on. These measurements captured how flat or inclined the ground was, if there was a large step along the path and so on. The researchers at Google DeepMind also added contact sensor data, so that it could determine if the robot was still walking or if it had fallen over. The idea is that based on the information in the state, the agent has to plan its next action. After all, if there’s a step along the path, that will require a different type of movement than if the ground were completely flat. We’ll design the reward as a feedback mechanism that tells the agent that it has chosen the appropriate movements. The reward will be our way of telling the agent, “Good job, for not running into that wall or too bad, you missed that step and fell down.” That’s just the main idea and we’ll go into depth in the next video.

%d 블로거가 이것을 좋아합니다: