[lesson1] Trying to understand fit_one_cylcle, unfreeze and max_lr

I totally understood what those things are in the lesson. When I was writing a notebook, I realized there are few possible ways to mix these.

So, I did 6 cases of different ways to use fit_one_cylcle, unfreeze and max_lr.

Here are those:

Could someone help me to figure out what exactly is going on in each of these cases?


Case 1

It is the ideal way of training model. When you initialize the Conv learner, the last layers have randomly assigned weights and the initial layers have weights pretrained on Imagenet. By default layers that are already trained are freezed ( won’t be trained until you unfreeze them ). So when you fit first you are training only the last layers. Then you unfreeze all the layers, now when you fit all layers get trained. And what max_lr does is, instead of apply the same learning rate to all the layers, it applies different LR across the layers. So for initial layers it is low and high for the last layers.

Case 2

Almost same as the above. But without max_lr you are applying same LR across all the layers. That is why error rate is high. Ideally initial layers are already trained, therefore LR is too high for them.

Case 3

Training only happens in the last layers as other layers are fixed. You should avoid training for long as it may overfit your model.

Case 4

As the initial layers are fixed. So only the last layers are training but not sure what max_lr is doing in this case.

Case 5

Training all the layers from the starting. It will work, but because the last layers are randomly assigned, it will give high error in the beginning

Case 6

Same as the case 4 , only last layers are getting trained.


Wow. I learnt a lot.
Thank you very much.

So basically this is the ideal flow:

  • fit as usual
  • find the learning rate
  • unfreeze and fit for 2 epochs with max_lr
  • If above gives good results, keep it or leave it.
1 Like

Absolutely ! But be careful about overfitting the model.

1 Like

@arunoda adding to the excellent reply from @isarth u can find the appropriate learning rate before every fit step using lr_find and use those value appropriately in the immediate fit/training step

1 Like

Thanks. I’ll try that.

Hi, Why 2 epochs, I think the number of epochs is the same parameter that should be tuned.

Thank you, good explanation!

Thank you very much! I learnt a lot.

BC of you many beginners understood what’s going on with that! Thank you so much!

Thank you for this wonderful post.