[Paper discuss] A disciplined approach to neural network hyper parameters by Leslie Smith

This paper is a gem for the practitioner. Despite had already seen many of those suggestions in the Fastai videos, it was good to understand a lot of the rational behind on why this or that works and to give attention to some ideas that are less discussed in the videos (as all the discussion about weight decay as @KevinB said) .

One question that came to my mind is: If we should use the largest batch size our GPU memory allows, and given that the size of the network does not change during training, one could calculate the maximum (optimal) batch size considering GPU memory, network size and frozen layers and never running out of memory. Right? :thinking: