From the blog post …
With the increased productivity this enabled, we were able to try far more techniques, and in the process we discovered a number of current standard practices that are actually extremely poor approaches. For example, we found that the combination of batch normalisation (which nearly all modern CNN architectures use) and model pretraining and fine-tuning (which you should use in every project if possible) can result in a 500% decrease in accuracy using standard training approaches. (We will be discussing this issue in-depth in a future post.) The results of this research are being incorporated directly into our framework.
What are the “standard training approaches” where things like batchnorm and pre-training/fine-tuning don’t work well?
Does it include approaches we’ve been learning in part 1 & 2 where batchnorm is everywhere and the recommendation has generally been to precompute the convolutional features for CNNs?