Lesson 6 - Official topic

For those last weights does it matter how many are added, is it random and is there a benefit of adding more layer of weights vs a smaller number?

The thing is that you don’t need to know the ideal learning rate to that degree of precision. A rough average wt the right order of magnitude is more than enough to efficiently train your model.

4 Likes

In my experience it is not necessary.
Depending on how long I train, I generally run lr_find 2-3 times per training (where 1 happens at the very beginning).

why do base_lr/2 – why not another fraction ?

1 Like

What does it mean when the lr_finder plot no longer looks like that? I’ve seen nearly flat plots.
Does that mean the model is trained?

1 Like

Yes, the added layers are randomly initialized. In our experience, you don’t need that many new layers, since the body of the network has already learned so much.

2 Likes

In the learner.fine_tune() method, why do we not call learning_rate finder again and take the min/10th learning rate rather than lr/=2?

2 Likes

For all those defaults, just try whatever you want o experiment with, and see if you get better results :wink:

3 Likes

I’m glad Jeremy is talking about the two different shapes for lr_find. I found last year a lot of people got confused on the second shape for a pretrained network.

7 Likes

Yes, it means the model has already learned thing on the task at end and does not have a randomly initialized part.

1 Like

when we freeze the last layer, during fine_tune, we obtain the gradients from the last layer but we dont change them and the gradient the are just forwarded to the following layer which are responsible for learning, am i correct ?..correct me if im wrong

Because in practice, we didn’t find we need it. This is a general method to quickly do transfer learning on a new dataset and have a very good baseline. You can always do it in two stages with 2 lr finds and see if you find better results.

2 Likes

Sorry, again on the lr_find :slight_smile: Was any effort spent into finding a robust ā€œautomaticā€ way of selecting a (possibly conservative) LR? When I try to use fastai in non-interactive jobs, that would be very useful.

2 Likes

Thank you :slight_smile:

Freezing means you only compute the gradients for part of the model (in this case the last layers). You don’t even compute the gradients for the rest of the model, let alone update the corresponding parameters.

1 Like

Yes. That’s why you have suggested values.

1 Like

Is

learn.fit_one_cycle(3, lr=1e-3)
learn.fit_one_cycle(7, lr=1e-3)

same as

learn.fit_one_cycle(10, lr=1e-3)?

Absolutely not, that is what Jeremy is explaining right now.

3 Likes

Is there something special that needs to be done to create a new random learner? I feel like I’ve had troubles before when running and training a learner and then going back to re-defining a new learner and re-fitting. I don’t have any current examples where I’m having issues though so it’s possible I was just not doing quite what I thought I was.

If you’ll train for 50 or 100 epochs will your model generalize better on a brand new data it has not seen before? In other words, you’ll have a very long tail, but the model will still be improving?