I recently came across the Group Normalization paper.
My intuition is that as batch sizes get smaller the error in batch normalization increases rapidly due to the diminishing sample size. This can be overcome by using the techniques outlined in the paper.
The benefit to this group is that this would remove the penalty of using lesser hardware as we can run very small batch sizes without compromising the ultimate accuracy of the trained model.
Would the be something that could be implemented under the covers in fast.ai?