I am running VGG16 on Cats vs Dogs (Lesson 1) on my laptop, with a small NVIDIA Quadro K1100M GPU, and due to its less memory I have been experimenting with different (smaller) batch sizes. Here is what I have got:
Sure. With a small batch size, the gradients are only a very rough approximation of the true gradients. So it’ll take a lot longer to find a good solution. Generally you’ll want a batch size of around 64 if you can manage it. Smaller batch sizes are OK, but will take a bit longer.
Hi Jeremy. Can I ask a little bit further in this issue? My computer has limited RAM on GPU so I can run with very small batch size. Is there any way to get better validation accuracy equivalent to running with bigger batch?
I can achieve 99.6% validation accuracy in less than 10 epochs of training with batch_size=512, but batch_size=128 I can’t get the validation accuracy past 48% even after hundreds of epochs of training and even if I use the same weights that I used to train the model with batch_size=512. In fact, even model.evaluate() gives me numbers in the same ballpack as the ones above, depending on what batch_size I feed it.
What can I do if I want to deploy this model on something that doesn’t have the RAM to handle batch_size=512?
I haven’t investigated it further. However, this only affects backpropagation, so if I want to deploy on a low end system for prediction only, I train with high batch_size, the model will behave at maximum accuracy since it only needs to do feed-forward.
Hi Jeremy,
I’m a bit confused about the explanation. Isn’t it the faster achievement of (nearly) optimal solution the major reason behind SGD and mini-batch gradient descent? As far as I understand reducing the batch size helps you reach the optimum faster but while compromising on final accuracy. So, will training it for a longer time improve the accuracy?
Sorry if I’m missing out something.