Papers : Batch size, learning rate, batch norm, generalizability

I wanted to share with you some interesting and related papers around batch size, learning rate, batch norm and generalizability :

Train longer, generalize better: closing the generalization gap in large batch training of neural networks https://arxiv.org/pdf/1705.08741.pdf

Don’t Decay the Learning Rate, Increase the Batch Size
https://openreview.net/pdf?id=B1Yy1BxCZ

Rethinking ImageNet Pre-training

How Does Batch Normalization Help Optimization?

Corollary food for thought:
Slow vs fast convergence : effect on final generalizability ?
Optimal batch size : single gpu (batch norm) vs multigpu (mean/sum of gradients) vs gradient accumulation benefit ?

2 Likes