[lesson1] Trying to understand fit_one_cylcle, unfreeze and max_lr

arunoda · October 25, 2018, 8:56am

I totally understood what those things are in the lesson. When I was writing a notebook, I realized there are few possible ways to mix these.

So, I did 6 cases of different ways to use fit_one_cylcle, unfreeze and max_lr.

Here are those:

Could someone help me to figure out what exactly is going on in each of these cases?
Thanks.

isarth · October 25, 2018, 9:29am

Case 1

It is the ideal way of training model. When you initialize the Conv learner, the last layers have randomly assigned weights and the initial layers have weights pretrained on Imagenet. By default layers that are already trained are freezed ( won’t be trained until you unfreeze them ). So when you fit first you are training only the last layers. Then you unfreeze all the layers, now when you fit all layers get trained. And what max_lr does is, instead of apply the same learning rate to all the layers, it applies different LR across the layers. So for initial layers it is low and high for the last layers.

Case 2

Almost same as the above. But without max_lr you are applying same LR across all the layers. That is why error rate is high. Ideally initial layers are already trained, therefore LR is too high for them.

Case 3

Training only happens in the last layers as other layers are fixed. You should avoid training for long as it may overfit your model.

Case 4

As the initial layers are fixed. So only the last layers are training but not sure what max_lr is doing in this case.

Case 5

Training all the layers from the starting. It will work, but because the last layers are randomly assigned, it will give high error in the beginning

Case 6

Same as the case 4 , only last layers are getting trained.

arunoda · October 25, 2018, 10:26am

Wow. I learnt a lot.
Thank you very much.

So basically this is the ideal flow:

fit as usual
find the learning rate
unfreeze and fit for 2 epochs with max_lr
If above gives good results, keep it or leave it.

isarth · October 25, 2018, 10:52am

Absolutely ! But be careful about overfitting the model.

raghavab1992 · October 25, 2018, 12:31pm

@arunoda adding to the excellent reply from @isarth u can find the appropriate learning rate before every fit step using lr_find and use those value appropriately in the immediate fit/training step

arunoda · October 25, 2018, 12:49pm

Thanks. I’ll try that.

arlanov · May 31, 2019, 1:30pm

Hi, Why 2 epochs, I think the number of epochs is the same parameter that should be tuned.

arlanov · May 31, 2019, 1:33pm

Thank you, good explanation!

naveen_kannan · July 26, 2019, 10:39am

Thank you very much! I learnt a lot.

agata_philipchuck · September 12, 2019, 8:51am

BC of you many beginners understood what’s going on with that! Thank you so much!

anandnimaidas · February 6, 2021, 11:52am

Thank you for this wonderful post.