Pytorch Lightning has Accumulate gratidents
Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN.
https://pytorch-lightning.readthedocs.io/en/latest/training_tricks.html#accumulate-gradients
I think fastai tried this approach but it had some side effects (source)
edit: added source