3 – M3L3 C03 V2

Before moving on, let’s talk a little bit more about what we just did and how it’s related to supervised learning. As we discussed in the previous video, we begin by playing the game for an episode. If we make it to the other end of the street safely and in time, then we win the game. Then for each state action pair in the episode, we modify the network just a little bit to make it slightly more likely to select that action when it encounters the corresponding state. The idea was that, well if we won, those must have been good actions to select from there corresponding states. So let’s modify the network, to reflect that by making it more likely to experience like gameplay in the future. So remember, how we use supervised learning for image classification. We have a data set of images along with their corresponding labels. And if we want to train a neural network to predict the label corresponding to any image. What we do is pass that image through the network to get a prediction. If the prediction is incorrect, we change the network weights just a little bit so that the prediction is slightly more correct. And if it’s correct, we also nudge the weights again just a little bit, so that the network is more certain of the correct label. And we just loop over the dataset until eventually, the neural network gets as close as possible to giving us accurate predictions for each image. Now this is really similar to what we described with reinforcement learning. Namely, each has a dataset of input output pairs that we’ll use to train the corresponding networks. One important difference is that when we do image classification. Typically we work with the dataset that doesn’t change over time. So for instance, we download the image net data set once, and then we just pull random batches to train the network. However, in this reinforcement learning setting. The dataset varies by episode. So we use the policy to collect an episode, that gives us a Dataset or a bunch of matched state action pairs, and then we use that data set once to do a batch of updates. After those updates are done, we’ll discard the dataset and then collect another episode which gives us another dataset and so on. So the dataset changes pretty frequently. And furthermore, it’s highly likely that we will experience a situation where the dataset has multiple conflicting opinions about what the best output should be for an input, or in other words, what the best action is to take from a game state. The equivalent for image classification, would be of the same image appeared twice in the dataset. Where one entry said the image was of a dog, and the other entry said the image contained a cat. This is something that we won’t encounter with image classification, and it does make our current situation more complex. But also I think more interesting.

%d 블로거가 이것을 좋아합니다: