Fastai v2 vision gets better result than DIY model. How to choose exact params?

Hi everyone,

I was trying to use transfer learning for the chest x-ray dataset via Kaggle.
I used Fastai vision per the Fastbook example. I got very good accuracy without any additional params.
However, I got less than 80% accuracy when I used PyTorch (modified udacity tutorial code).

  1. How to see learner’s params including lr value? learn.summary() shows optim & loss func. I would like to know what is the exact lr.

  2. Fastai used Adam and FlattenedLoss of CrossEntropyLoss(). However, udacity used Adam and CrossEntropyLoss(). Is that the reason that Fastai is better?

  3. I used ‘train’ & ‘val’ directories for Fastai, and ‘train’ & ‘test’ directories for udacity. Is that the reason I get different results?

Here are two notebooks related to my question:

  1. Fastai notebook:
    https://github.com/drniwech/deep.learning/blob/main/kaggle/chest-xray-pneumonia-with-fast-ai-vision.ipynb
  2. udacity notebook:
    https://github.com/drniwech/deep.learning/blob/main/kaggle/chest-xray-pneumonia-transfer-learning-ud-pytorch.ipynb

I would appreciate it if you can help me with this.

Thank you
Niwech

Hi @lukemshepherd,

First of all, thank you so much for your help. I will change the params per your suggestions and rerun to see if it gives any good result like fastai.

I reviewed chapter 7 again and realized that fastai has default stuff (Normalization, test-time-augmented images). I will try with different params to see how far I can mimic fastai.
I did change the Normalization to mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] per Pytorch document. However, no improvement in the result.

One thing that I notice is image augmentation. I used RandomResizeCrop(224). However, “By default, fastai will use the unaugmented center crop image plus four randomly augmented images.” I’m not sure how to apply the same strategy though.

Thank you

There’s a bit more to this too. fine_tune isn’t simply calling fit. It’s following Leslie Smith’s One-Cycle policy twice. First frozen (so the backbone of the model is frozen, and only the head is trained), then unfrozen (as @lukemshepherd mentioned, the entire model). On top of this fastai has different layer groups which are utilized in the optimizer, and each can get a slightly different learning rate depending on what is passed in (which is why we can do something like lr = slice(1e-3, 1e-4). To have a better comparison you should either mimic the one-cycle policy, or just train fastai completely unfrozen with fit.

1 Like

Hi @muellerzr,

Thank you for your recommendations. It looks like I won’t be able to beat fastai magic easily.
Do you know if anyone ever uses fastai to compete at Kaggle?