Q about train_imagenette.py

It feels like I shouldn’t be opening a new thread for this but here we go:

In both train_imagenette and train_imagenet, we have the following line:
bs_rat = tot_bs/256.

Is 256 a hard coded value that should actually be our chosen batch size per GPU?

That would explain why I got better results when replicating results from the Imagenette leaderboard to use as my baseline (I hadn’t used that line). Jeremy used a bs=64, so his lr would have been divided by 4 unintentionally.

1 Like

You can definitely make it a parameter if you want. It’s just the best learning rate was computed for bs=256 in our case, so we adapt it with this rule of thumb.


Thanks for the clarification!