3 – Minibatch Size8

Minibatch size is another hyper parameter that, no doubt, you’ve run into a number of times already. It has an effect on the resource requirements of the training process but also impacts training speed and number of iterations in a way that might not be as trivial as you may think. It’s important to review a little bit of terminology here first. Historically, there have been debates on whether it’s better to do online, also called stochastic training, where you fit a single example of data set to the model during a training step, and using only one example do a forward pass, calculate the error and then backpropagate and set adjusted values for all your parameters. And then do this again for each example in the dataset or if it was better to feed the entire dataset to the training step and calculate the gradient using the error generated by looking at all the examples in the dataset. This is called batch training. The abstraction commonly used today is to set a minibatch size. So online training is when the minibatch size is one and batch training is when the minibatch size is the same as the number of examples in the training set. And we can set the minibatch size to any value between these two values. The recommended starting values for your experimentation are between one and a few hundred, with 32 often being a good candidate. A larger minibatch size allows computational boost that utilizes matrix multiplication in the training calculations but that comes at the expense of needing more memory for the training process and generally more computational resources. Some out of memory errors in Tensorflow can be eliminated by decreasing the minibatch size. It’s important, however, to note that this computational boost comes at a cost. In practice, small minibatch sizes have more noise in their error calculations and this noise is often helpful in preventing the training process from stopping at local minima on the error curve rather than the global minima that creates the best model. So while the computational boost incentivizes us to increase the minibatch size, this practical algorithmic benefit incentivizes us to actually make it smaller. So, in addition to 32, you might also want to try to experiment with 64 and 128 and 256 and depending on your data and task you might have to experiment with other values as well. This is an experimental result for the effect of batch size on convolutional neural nets. It’s from the paper titled “Systematic Evaluation of CNN Advances on the Imagenet”. It shows that using the same learning rate the accuracy of the model decreases, the larger the minibatch size becomes. Now this is not only the effect of the minibatch size but due to the fact that we need to change the learning rate if we change the batch size. If we do adjust the learning rate as we increase the batch size we can see that the accuracy does decrease but only slightly the more we increase the batch size. So to sum up, for the minibatch size, too small could be too slow, too large could be computationally taxing and could result in worse accuracy. And 32 to 256 are potentially good starting values for you to experiment with.

%d 블로거가 이것을 좋아합니다: