Effective Batch Size

miwojc · March 31, 2020, 2:54pm

Pytorch Lightning has Accumulate gratidents

Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN.

I think fastai tried this approach but it had some side effects (source)

edit: added source