Effective Batch Size

Pytorch Lightning has Accumulate gratidents

Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN.

https://pytorch-lightning.readthedocs.io/en/latest/training_tricks.html#accumulate-gradients

I think fastai tried this approach but it had some side effects (source)

edit: added source

3 Likes