So, let’s talk about life. In life, there are two mistakes one can make. One is to try to kill Godzilla using a flyswatter. The other one is to try to kill a fly using a bazooka. What’s the problem with trying to kill Godzilla with a flyswatter? That we’re oversimplifying the problem. We’re trying a solution that is too simple and won’t do the job. In machine learning, this is called underfitting. And what’s the problem with trying to kill a fly with a bazooka? It’s overly complicated and it will lead to bad solutions and extra complexity when we can use a much simpler solution instead. In machine learning, this is called overfitting Let’s look at how overfitting and underfitting can occur in a classification problem. Let’s say we have the following data, and we need to classify it. So what is the rule that will do the job here? Seems like an easy problem, right? The ones in the right are dogs while the ones in the left are anything but dogs. Now what if we use the following rule? We say that the ones in the right are animals and the ones in the left are anything but animals. Well, that solution is not too good, right? What is the problem? It’s too simple. It doesn’t even get the whole data set right. See? It misclassified this cat over here since the cat is an animal. This is underfitting. It’s like trying to kill Godzilla with a flyswatter. Sometimes, we’ll refer to it as error due to bias. Now, what about the following rule? We’ll say that the ones in the right are dogs that are yellow, orange, or grey, and the ones in the left are anything but dogs that are yellow, orange, or grey. Well, technically, this is correct as it classifies the data correctly. There is a feeling that we went too specific since just saying dogs and not dogs would have done the job. But this problem is more conceptual, right? How can we see the problem here? Well, one way to see this is by introducing a testing set. If our testing set is this dog over here, then we’d imagine that a good classifier would put it on the right with the other dogs. But this classifier will put it on the left since the dog is not yellow, orange, or grey. So, the problem here, as we said, is that the classifier is too specific. It will fit the data well but it will fail to generalize. This is overfitting. It’s like trying to kill a fly with a bazooka. Sometimes, we’ll refer to overfitting as error due to variance. The way I like to picture underfitting and overfitting is when studying for an exam. Underfitting, it’s like not studying enough and failing. A good model is like studying well and doing well in the exam. Overfitting is like instead of studying, we memorize the entire textbook word by word. We may be able to regurgitate any questions in the textbook but we won’t be able to generalize properly and answer the questions in the test. But now, let’s see how this would look like in neural networks. So let’s say this data where, again, the blue points are labeled positive and the red points are labeled negative. And here, we have the three little bears. In the middle, we have a good model which fits the data well. On the left, we have a model that underfits since it’s too simple. It tries to fit the data with the line but the data is more complicated than that. And on the right, we have a model that overfits since it tries to fit the data with an overly complicated curve. Notice that the model in the right fits the data really well since it makes no mistakes, whereas the one in the middle makes this mistake over here. But we can see that the model in the middle will probably generalize better. The model in the middle looks at this point as noise while the one in the right gets confused by it and tries to feed it too well. Now the model in the middle will probably be a neural network with a slightly complex architecture like this one. The one in the left will probably be an overly simplistic architecture. Here, for example, the entire neural network is just one preceptors since the model is linear. The model in the right is probably a highly complex neural network with more layers and weights than we need. Now here’s the bad news. It’s really hard to find the right architecture for a neural network. We’re always going to end either with an overly simplistic architecture like the one in the left or an overly complicated one like the one in the right. Now the question is, what do we do? Well, this is like trying to fit in a pair of pants. If we can’t find our size, do we go for bigger pants or smaller pants? Well, it seems like it’s less bad to go for a slightly bigger pants and then try to get a belt or something that will make them fit better, and that’s what we’re going to do. We’ll err on the side of an overly complicated models and then we’ll apply certain techniques to prevent overfitting on it.