Pets - does normalization hold back a pre-trained model?

Ralph · November 18, 2018, 4:29pm

I’ve been using the pets model as my sandbox, and trying various options to compare fp16 and fp32 on my 2080. My latest test is to compare with and without “.normalize(imagenet_stats)”

I do three frozen epochs (fit_one_cycle(3, max_lr=slice(1e-3))) then unfreeze and run forty epochs with a teeny learning rate (fit_one_cycle(40, max_lr=slice(1e-7,1e-5))). Resnet34, BS 64, SZ 256 petsnn

Perhaps because I am starting with a well-trained model, the tests without normalization have not blown up, and appear to consistently beat the normalized tests.

From the chart, you can see that fp16 models tend to flatten out around 30 epochs, while the fp32 models are still jumping around. The best is fp32 with no normalization (nn). This is well ahead of the others even after the frozen epochs. I ran that model a second time (32nn2), and while the results were not as good, they were still better than the other options. I ran more epochs and 32nn2 reached .045 after another 25. At that point, train loss was .197 and valid loss was .167.

I have tried various SZ and BS, as well as RN50 variations, but the results have consistently favored fp32 and no normalization. I’m currently rerunning RN50 variations from scratch and will post another chart in a few hours.

Ralph · November 18, 2018, 8:46pm

For resnet50, no normalization still wins, though in this case fp16 beats fp32.

pets2

Ralph · November 19, 2018, 11:11pm

Talking to myself but FP 16 RN101, BS 16 and SZ 320 gets

27 0.141812 0.131647 0.037889 with normalization compared to
28 0.132895 0.110596 0.029093 without

16nn does get to
45 0.067314 0.101096 0.026387 but by that point the training set is improving a lot faster than val and error, so probably overfitting.