Batch size effect on validation accuracy


(Christos Iraklis Tsatsoulis) #1

Hi Jeremy & Rachel and thanks for the great MOOC.

I am running VGG16 on Cats vs Dogs (Lesson 1) on my laptop, with a small NVIDIA Quadro K1100M GPU, and due to its less memory I have been experimenting with different (smaller) batch sizes. Here is what I have got:

batch size      validation accuracy
   2                   0.88
   4                   0.97
   8                   0.98

i.e. it seems that validation accuracy benefits from bigger batch size. Could you elaborate a little on this?

Many thanks in advance

Christos


Lesson 4 Language Model Out of Memory
(Jeremy Howard (Admin)) #2

Sure. With a small batch size, the gradients are only a very rough approximation of the true gradients. So it’ll take a lot longer to find a good solution. Generally you’ll want a batch size of around 64 if you can manage it. Smaller batch sizes are OK, but will take a bit longer.


(Rakpong Kittinaradorn) #3

Hi Jeremy. Can I ask a little bit further in this issue? My computer has limited RAM on GPU so I can run with very small batch size. Is there any way to get better validation accuracy equivalent to running with bigger batch?


(Matthew Kleinsmith) #4

One way is to train longer. Here’s what Jeremy said:


(Valentin Brasso) #5

Hello,

I’ve stumbled upon a very strange situation where the batch_size is the major factor in the validation set accuracy of my model. Proof:

model.fit(conv_trn_feat, trn_labels, nb_epoch=1, batch_size=256, validation_data=(conv_val_feat, val_labels))
Train on 2001 samples, validate on 500 samples
Epoch 1/1
2001/2001 [==============================] - 6s - loss: 0.0546 - acc: 0.9845 - val_loss: 1.4992 - val_acc: 0.7700
model.fit(conv_trn_feat, trn_labels, nb_epoch=1, batch_size=128, validation_data=(conv_val_feat, val_labels))
Train on 2001 samples, validate on 500 samples
Epoch 1/1
2001/2001 [==============================] - 9s - loss: 0.0439 - acc: 0.9835 - val_loss: 4.2225 - val_acc: 0.4520
model.fit(conv_trn_feat, trn_labels, nb_epoch=1, batch_size=256, validation_data=(conv_val_feat, val_labels))
Train on 2001 samples, validate on 500 samples
Epoch 1/1
2001/2001 [==============================] - 8s - loss: 0.0291 - acc: 0.9870 - val_loss: 1.4931 - val_acc: 0.7600
model.fit(conv_trn_feat, trn_labels, nb_epoch=1, batch_size=512, validation_data=(conv_val_feat, val_labels))
Train on 2001 samples, validate on 500 samples
Epoch 1/1
2001/2001 [==============================] - 8s - loss: 0.0183 - acc: 0.9940 - val_loss: 0.0129 - val_acc: 0.9960

I can achieve 99.6% validation accuracy in less than 10 epochs of training with batch_size=512, but batch_size=128 I can’t get the validation accuracy past 48% even after hundreds of epochs of training and even if I use the same weights that I used to train the model with batch_size=512. In fact, even model.evaluate() gives me numbers in the same ballpack as the ones above, depending on what batch_size I feed it.

What can I do if I want to deploy this model on something that doesn’t have the RAM to handle batch_size=512?


(Pietz) #6

feels like im in a similar position as you, although my results are not as extreme. have you ever investigated the problem with the batch size more?


(Valentin Brasso) #7

I haven’t investigated it further. However, this only affects backpropagation, so if I want to deploy on a low end system for prediction only, I train with high batch_size, the model will behave at maximum accuracy since it only needs to do feed-forward.


(Greg McKenzie) #8

When you deploy models, I interpret that as using the model to predict an answer.

I understand, the batch_size is for training and getting gradients to obtain better weights within your model.

To deploy models, the model merely apply the weights at the different layers of the model for a single prediction.

I’m just ramping up with this NN, but that’s my understanding so far. Hope it helps.


(Pietz) #9

what loss function did you use?


(Valentin Brasso) #10

I used Mean Squared Error (‘mse’). I was working on a regression problem.


(vittorio) #11

Instead, what is the purpose of the validation batch size?


(Jonathan Aghachi) #12

What are the pro’s and con’s of batch size equalling your data size of one class or the whole data set?