Lesson 1 : Retrain doesn't improve Resnet 50

andrew77 · March 14, 2020, 9:34am

Some questions

are these the normal steps?
i.e lr_find, then fit_one_cycle , unfreeze then retrain based on the max_lr parameter?
did I use the slice correctly?

3)the reason is not better than 0.04 is it because it requires more epoch as 3 only brings down to 0.35.

Is it usually quite difficult to beat the baseline of 0.04 and is it ok to export this model if this is the better one?

Thanks.

Archaeologist · March 14, 2020, 1:59pm

This is what I found unusual in your code:

You do not set a learning rate in your 1st training cycle [35]. Try this instead, for example

learn.fit_one_cycle(8, max_lr=1e-3)

After this, you should again use lr_find to get a new learning rate for your 2nd cycle.

andrew77 · March 14, 2020, 3:15pm

Apologies, I do not understand what you meant by not setting a learning rate at [35]. I don’t think I did, I only set the LR at [38].

Archaeologist · March 14, 2020, 3:49pm

Yes Andrew, you did not set a LR, but I recommend that you do set it perhaps using my code example. This is what Jeremy does almost all the time.

andrew77 · March 15, 2020, 7:38am

Hi @Archaeologist, thanks for the prompt reply.

max_lr=1e-3

Does it mean that it will be using LR of 1e-3 consistently from start to the end?

I’m currently at Lesson 2 and Jeremy didn’t use it as of now. Is it something that is practised later on?

Screenshot from Lesson 2, Jeremy’s lecture

I run this at the lesson 2 notebook

Seems like all roads leads to Rome.

Cheers.

Archaeologist · March 15, 2020, 8:54am

OK it has been a while ago that I did lesson 2…

To clarify: fit_one_cycle() always has a maximum learning rate (max_lr), if you do not set it, it will use the default, which happens to be 1e-3 I did not know this but in your (and Jeremy’s) example it just happens to be the right value to pick.

The 2nd issue I mentioned is that you do not use the learning rate finder again when you do your 2nd training. A neural network modifies its internal state while training. So in your initial example (where you use the lr_find() only once at the beginning) the 1e-3 is right for your first training, but when you use learn_fit_one_cycle() again, you should call lr_find() beforehand for an estimate of a new max. learning rate.

Jeremy has a good explanation on learning rates in lesson 3 or 4 if I remember correctly.

To come back to your initial question:

are these the normal steps?
i.e lr_find, then fit_one_cycle , unfreeze then retrain based on a NEW max_lr parameter obtained with a second lr_find()
did I use the slice correctly? looks good but I cannot say because it is unclear where you found the learning rates you use?