When we minimize the network error using backpropagation, we may either properly fit the model to the data or overfit. Generally speaking, when we have a finite training set, there’s a risk of overfitting. Overfitting means that our model will fit the training data too closely. In other words, we over trained the model or the network to fit our data. As a result, we unintentionally also model the noise or random elements of the training set. If that happens, our model will not generalize well when tested on new inputs. There are generally two main approaches to addressing the overfitting problem. The first is to stop the training process early, and the second is the use of regularization. When we stop the training process early, we do that in the region where the network begins to overfit. By doing so, we reduce degradation in the performance on the test set. It would be ideal if we knew precisely when we should stop the training process. However, that is often difficult to determine. One way of determining when to stop the training is by carving a small data set out of the training set which we will call the validation set. Assuming that the accuracy of the validation set is similar to that of the test set, we can use it to estimate when the training should stop. The drawback of this approach is that we end up with fewer samples to train our model on, so our training set is smaller. An alternative mainstream approach to mitigating overfitting is to use regularization. Regularization means that we impose a constraint on the training of the network such that better generalization can be achieved. Dropout is a widely used regularization scheme which helps in that manner.