Let’s now talk about the hyperparameters that relates to the model itself rather than the training or optimization process. The number of hidden units, in particular, is the hyperparameter I felt was the most mysterious when I started learning about machine learning. The main requirement here is to set a number of hidden units that is “large enough”. For a neural network to learn to approximate a function or a prediction task, it needs to have enough “capacity” to learn the function. The more complex the function, the more learning capacity the model will need. The number and architecture of the hidden units is the main measure for a model’s learning capacity. If we provide the model with too much capacity, however, it might tend to overfit and just try to memorize the training set. If you find your model overfitting your data, meaning that the training accuracy is much better than the validation accuracy, you might want to try to decrease the number of hidden units. You could also utilize regularization techniques like dropouts or L2 regularization. So, as far as the number of hidden units is concerned, the more, the better. A little larger than the ideal number is not a problem, but a much larger value can often lead to the model overfitting. So, if your model is not training, add more hidden units and track validation error. Keep adding hidden units until the validation starts getting worse. Another heuristic involving the first hidden layer is that setting it to a number larger than the number of the inputs has been observed to be beneficial in a number of tests. What about the number of layers? Andrej Karpathy tells us that in practice, it’s often the case that a three-layer neural net will outperform a two-layer net, but going even deeper rarely helps much more. The exception to this is convolutional neural networks where the deeper they are, the better they perform.