5 – Dropout

Here’s another way to prevent overfitting. So, let’s say this is you, and one day you decide to practice sports. So, on Monday you play tennis, on Tuesday you lift weights, on Wednesday you play American football, on Thursday you play baseball, on Friday you play basketball, and on Saturday you play ping pong. Now, after a week you’ve kind of noticed that you’ve done most of them with your dominant hand. So, you’re developing a large muscle on that arm but not on the other arm. This is disappointing. So, what can you do? Well, let’s spice it up on the next week. What we’ll do is on Monday we’ll tie our right hand behind our back and try to play tennis with the left hand. On Tuesday, we’ll tie our left hand behind your back and try to lift weights with the right hand. Then on Wednesday again, we’ll tie our right hand and play American football with the left one. On Thursday we’ll take it easy and play baseball with both hands, that’s fine. Then, on Friday we’ll tie both hands behind our back and try to play basketball. That won’t work out too well. But it’s OK. It’s the training process. And then on Saturday again, we tie our left hand behind our back and play ping pong with the right. After a week, we see that we’ve developed both of our biceps. Pretty good job. This is something that happens a lot when we train neural networks. Sometimes one part of the network has very large weights and it ends up dominating all the training, while another part of the network doesn’t really play much of a role so it doesn’t get trained. So, what we’ll do to solve this is sometimes during training, we’ll turn this part off and let the rest of the network train. More thoroughly, what we do is as we go through the epochs, we randomly turn off some of the nodes and say, you shall not pass through here. In that case, the other nodes have to pick up the slack and take more part in the training. So, for example, in the first epoch we’re not allowed to use this node. So, we do our feat forward and our back propagation passes without using it. In the second epoch, we can’t use these two nodes. Again, we do our feet forward and back prop. And in the third epoch we can’t use these nodes over here. So, again, we do forward and back prop. And finally in last epoch, we can’t use these two nodes over here. So, we continue like that. What we’ll do to drop the nodes is we’ll give the algorithm a parameter. This parameter is the probability that each node gets dropped at a particular epoch. For example, if we give it a 0.2 it means each epoch, each node gets turned off with a probability of 20 percent. Notice that some nodes may get turned off more than others and some others may never get turned off. And this is OK since we’re doing it over and over and over. On average each node will get the same treatment. This method is called dropout and it’s really really common and useful to train neural networks.

%d 블로거가 이것을 좋아합니다: