I am running `lr_find`

on a pretrained resnet34 on a dataset that is giving me plot like this.

Is this a bug? Or is there no good learning rate for this data?

I am confused about what rate to select here.

I am running `lr_find`

on a pretrained resnet34 on a dataset that is giving me plot like this.

Is this a bug? Or is there no good learning rate for this data?

I am confused about what rate to select here.

1 Like

I’m not using fastai but I’ve implemented something similar in my own code, and I’ve seen such a plot before but only when I messed something up in my data / model / training code.

What I would try is making all layers learnable and resetting the weights to random values (so not loading the pretrained weights), and see what sort of learning curve you get then, and also what the initial loss is. If it still doesn’t work then there’s something wrong with the model or the data. If it *does* work then the issue is with the pretrained weights.

3 Likes

Thanks for the reply!

I have tried resetting learner and I got a normal looking graph on `lr_find`

on the first run (with precompute on).

After I ran the fit once using lr = 0.07 based on this graph, I wanted to check how lr changes, and I got back the graph mentioned in OP.

I am not sure what to make out of this loss function here. (I am not sure if running lr_find on same learner makes sense either.)

I seem to remember that we can change learning rate during the training process (I might be wrong). Can `lr_find`

only be run once per learner initially? If someone can shed some light on a kernel with multiple runs of lr_find in same training process with different learning rates output, that would be handy too!

2 Likes

The way the LR finder works is by training a batch using the starting LR, then training the next batch with a slightly higher LR, then the next batch with an even higher LR, and so on.

When the model isn’t trained at all yet, the loss will change quite rapidly with the training of each new batch. But once the method has been trained for a few epochs already, the loss will not change very quickly anymore from one batch to the next.

So what the plot from cell 36 shows you is that at this point in the training process the learning rates up to 1e-1 don’t really change the loss very much but if you go higher than 1e-1 the loss explodes.

3 Likes

Thanks for the explanation! It seems so obvious now.