Lesson 9 - Smaller batch size results in lower loss

Hello to all!
I just trained the Model from the pascal-multi notebook with two different batch sizes (64 and 32) while keeping all other hyper parameters. While doing that, I noticed, that after halving the batch size the loss also decreased in half. Shouldn’t be the loss independent of the batch size?
Is this the intended behaviour? What am I missing here?


1 Like

There was a recent DL twitter kerfuffle about batch size but I think the eventual conclusion was that small batches worked better early in the training and larger batches were good for fine tuning after the model had taken shape.

I don’t believe we wrote the loss function for this one in a way that’s independent of batch size. You can easily change it so it is, however.

Ahh, yeah! Now that you have mentioned it, I see what I missed. So just dividing by the batch size at the end of the ssd_loss function should do it. Right?

1 Like

I expect that’s all you need.

Theres still research being done on this, but the general conclusion right now is that smaller batches lead to more optimal solutions that generalise better to test data than larger batches. However changing some parameters can lead to the exact opposite: https://arxiv.org/pdf/1712.09913.pdf . theres a video of this which is easier to watch

1 Like

Hello Thomas,
Can you share your pascal-multi notebook with me? I am getting an error when I run:

ssd_loss(batch, y, True)

The error reads:

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'other’

my local fastai repo is up-to-date on ubuntu 16.04.

I suspect the network is still on cpu because I am following the netbook sequentially…and Jeremy may have run some other cells earlier that placed the network on gpu. I don’t know.


Never mind Thomas,
I’ve got this.

For those of you reading this and have seen the same error:

I am on ubuntu 16.04LTS, pytorch 0.4, uptodate on fastai, on a single GPU.

In the pascal-multi.ipynb get rid of all .cpu and in class BCE_Loss, definition of forward change the line

from t = V(t[:,:-1].contiguous()) to t = V(t[:,:-1].contiguous().cuda())

The notebook runs well sequentially

1 Like