Increasing loss for small batch size

It’s my first post here, I finished the first part of the book and I’ve wrote a simple image classifier.

I recently went to the Uffizi Museum in Florence and there I toke some pictures of paints and statues, so I wrote down a neural network to identify if an image contains a paint or a statue.
It works very well despite the small number of data for training, I’m amazed, it even classified a paint that portray a statue as a paint with a 99% of confidence! :smiley:

While writing the code for the NN I noticed that if I use a batch size bs of 5 the train_loss increased after epoch 3 (on a total of 4)

dls = datablock.dataloaders('/content/images', bs=5)
learn = cnn_learner(dls, resnet18, metrics=error_rate)

instead if I use a batch size of 10 the train_loss decreases from epoch 1 to epoch 4 as I expected.

Do you know why using a small batch size the train_loss at a certain epoch starts to increase?


1 Like

It’s the stochastic in SGD. With the smaller batch size, the variance of the gradient is larger.

1 Like

Thanks, I guess the explanation is in the next parts of the book after the first :smiley: