Thanks a lot for checking this out. I played around with these parameters (wd, true_wd, bn_wd) without any notable effect.
Yeah, though looks like you are doing shortish runs (for obvious reasons), it might be a speed thing and PyTorch/TF will catch up in the end,
This didn’t seem to be the reason - in PyTorch/TF I also trained much longer, and experimenting with different learning rates, but I never got accuracies higher than 80%.
But now, I finally found out where the magic is happening:
I thought, that when you create a learner using the cnn_learner() function, by default the complete pretrained cnn model would be frozen, and only the appended head would be trainable.
This is NOT the case! The BatchNorm Layers in the pretrained model are trainable by default! In my TF/PyTorch eperiments, I trained the models with frozen BatchNorm Layers. This was the reason for the different performance.
You can configure this using the train_bn parameter, when creating a new Learner:
learn = cnn_learner(data, models.resnet50, metrics=[accuracy], train_bn=False)
… This gives me only 71% accuracy after 3 epochs (compared to 89%, when train_bn=True).
There are actually threads on that.
Apparently Jeremy also mentions it in part 2 of this years course.
It seems to be absolutely crucial to not freeze the BatchNorm layers when doing CNN transfer learning!
One thing I still don’t understand, is why learner.fit(3) and learner.fit(3, lr=0.003) gave me different results, as 0.003 clearly is the default value for the learning rate… but that’s for another day.
@TomB Thank you very much for your help.