Why using values found with `lr_find` on resnet34 for resnet50

In Lesson 1 Pets notebook, we use lr_find and recorder_plot to plot the learning rate curve and find the lowest and highest values as below

Naturally, you would think that when applied with resnet50, we shall use the same approach to find the lowest and highest learning rate values, rather than using the values from resnet34.

At first, the code seems to follow this thinking, as seen below which looks different from that of Resnet34

If we are to find out the lowest and highest values for resnet50, then the lowest rate should be the lowest loss corresponding learning rate (1e-1) shrink by 10 times, which will be 1e-1/10 = 1e-2 . Of course, lowest value as 1e-2 does not make sense, but does it leave us to using resnet 34’s learning rate range?

However, the notebook used those values from resnet34 instead.

Why is it?

1 Like

I realized that there is a big difference between these two plots.

The first one is using the unfreezed model (ResNet34), but the second plot is using the freezed model (ResNet50)

Hi Daniel,

As you noticed, the two plots are showing different lr_find output for resnet34 learner and resnet50 learner.

It is true that you could use lr_find with the resnet50 learner to find a good learning rate to apply, so from the second plot, we could have used 1e-1 /10 = 1e-2 for the resnet50 learner. In the notebook, a custom max_lr was not passed to fit_one_cycle, so it would have used the defaults, which is set to 0.003 in the code. This is lower than the 1e-2 you noticed, and still on the decreasing-loss part of the lr_find curve, so the default 0.003 should still work, though 1e-2 should allow the learner to train faster.

For the resnet50 unfreezed part, you could run lr_find again on the unfreezed learner, to find a new max_lr to be used. In the notebook, the max_lr found for resnet34 was reused (or perhaps Jeremy and co. ran lr_find separately and found that these hyperparams worked fine?), and it did seem to work ok.

TL; DR: You can indeed run lr_find to get some good values for max_lr for different learners, but in the notebook the values used seemed to work fine (which shows the robustness of fast.ai library defaults!)




Thanks for your reply! I will try to experiment it and get back to you later