I was trying to use transfer learning for the chest x-ray dataset via Kaggle.
I used Fastai vision per the Fastbook example. I got very good accuracy without any additional params.
However, I got less than 80% accuracy when I used PyTorch (modified udacity tutorial code).
How to see learner’s params including lr value? learn.summary() shows optim & loss func. I would like to know what is the exact lr.
Fastai used Adam and FlattenedLoss of CrossEntropyLoss(). However, udacity used Adam and CrossEntropyLoss(). Is that the reason that Fastai is better?
I used ‘train’ & ‘val’ directories for Fastai, and ‘train’ & ‘test’ directories for udacity. Is that the reason I get different results?
If you don’t pass any parameters to the learner it will fall back to its default settings - easiest way to have a look what they are is to look that the docs for Learnerclass Learner.
Honestly, I’m not sure but I don’t think it would much of a difference
Yes you would, the Training and Validation are different datasets and so will have different accuracies.
Looking at the code I think there are couple of things that made a bigger difference:
Lr rates:
The default fastai learning rate is: lr = 0.001 vs the pytorch one you defined at lr=0.0001. This is probably the biggest reason for the difference, the fastai model is just taking bigger steps and therefore learning better, faster.
Learn.fine_tune()
When you fine tune a model fastai freezes the weights of layers apart from the last ‘parameter group’ using fit one-cycle.
This means that you are only training the last 2 layers of the model, to your data - retaining the deeper, more general convolutions (colour gradients, edge detection etc…) from the imagnet pretraining and re learning shallower the the convulsions that identified cars or fish etc…
Unlike in pytorch were you are training ALL the layers - which means that there is a chance that the deeper weights might degrade some of the deep ‘general’ convolutions reducing there performance (especially with a small amount of training like 10 epochs).
I think these are the biggest reasons
Fastai also has a default momentum value which will also will help increase the optimisation step size
Sorry it’s quite a big answer, hope that sort of answers it!
First of all, thank you so much for your help. I will change the params per your suggestions and rerun to see if it gives any good result like fastai.
I reviewed chapter 7 again and realized that fastai has default stuff (Normalization, test-time-augmented images). I will try with different params to see how far I can mimic fastai.
I did change the Normalization to mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] per Pytorch document. However, no improvement in the result.
One thing that I notice is image augmentation. I used RandomResizeCrop(224). However, “By default, fastai will use the unaugmented center crop image plus four randomly augmented images.” I’m not sure how to apply the same strategy though.
There’s a bit more to this too. fine_tune isn’t simply calling fit. It’s following Leslie Smith’s One-Cycle policy twice. First frozen (so the backbone of the model is frozen, and only the head is trained), then unfrozen (as @lukemshepherd mentioned, the entire model). On top of this fastai has different layer groups which are utilized in the optimizer, and each can get a slightly different learning rate depending on what is passed in (which is why we can do something like lr = slice(1e-3, 1e-4). To have a better comparison you should either mimic the one-cycle policy, or just train fastai completely unfrozen with fit.
Thank you for your recommendations. It looks like I won’t be able to beat fastai magic easily.
Do you know if anyone ever uses fastai to compete at Kaggle?
@muellerzr Thanks for pointing that out and explaining - I use fit one cycle so much I forget that not all fit methods use it! (edited the heading for clarity)