Hey everyone! I followed the vision tutorial (using CelebA dataset), and – as usual with fast.ai – got great results with very little effort, ~0.93 accuracy on validation.
Using all the fast.ai defaults, so this is using pre-trained Resnet-34 with avg and max pooling and two dense layers (with Dropout and Batchnorm) on top, for the classification. The Resnet layers are frozen, except all the Batchnorms.
I tried to re-create this setup with just PyTorch, to learn how it works, and couldn’t get over 0.80 accuracy with the Batchnorms frozen. I would expect that one could use the Resnet base model as embedding: pre-process all the data and store the outputs, and then train a two-layer model (completely separate model), feeding it these embeddings, rather than raw images, and get comparable performance. I tried that too, and also can’t get over 0.80 accuracy. When I unfreeze the Batchnorm weights, it’s easy to get 0.90+ accuracy again.
I’m surprised by this behavior, and wonder if someone has any explanation? What am I missing?
Thank you for any tips.