Lesson 6 - Official topic

For those last weights does it matter how many are added, is it random and is there a benefit of adding more layer of weights vs a smaller number?

The thing is that you donā€™t need to know the ideal learning rate to that degree of precision. A rough average wt the right order of magnitude is more than enough to efficiently train your model.

4 Likes

In my experience it is not necessary.
Depending on how long I train, I generally run lr_find 2-3 times per training (where 1 happens at the very beginning).

why do base_lr/2 ā€“ why not another fraction ?

1 Like

What does it mean when the lr_finder plot no longer looks like that? Iā€™ve seen nearly flat plots.
Does that mean the model is trained?

1 Like

Yes, the added layers are randomly initialized. In our experience, you donā€™t need that many new layers, since the body of the network has already learned so much.

2 Likes

In the learner.fine_tune() method, why do we not call learning_rate finder again and take the min/10th learning rate rather than lr/=2?

2 Likes

For all those defaults, just try whatever you want o experiment with, and see if you get better results :wink:

3 Likes

Iā€™m glad Jeremy is talking about the two different shapes for lr_find. I found last year a lot of people got confused on the second shape for a pretrained network.

7 Likes

Yes, it means the model has already learned thing on the task at end and does not have a randomly initialized part.

1 Like

when we freeze the last layer, during fine_tune, we obtain the gradients from the last layer but we dont change them and the gradient the are just forwarded to the following layer which are responsible for learning, am i correct ?..correct me if im wrong

Because in practice, we didnā€™t find we need it. This is a general method to quickly do transfer learning on a new dataset and have a very good baseline. You can always do it in two stages with 2 lr finds and see if you find better results.

2 Likes

Sorry, again on the lr_find :slight_smile: Was any effort spent into finding a robust ā€œautomaticā€ way of selecting a (possibly conservative) LR? When I try to use fastai in non-interactive jobs, that would be very useful.

2 Likes

Thank you :slight_smile:

Freezing means you only compute the gradients for part of the model (in this case the last layers). You donā€™t even compute the gradients for the rest of the model, let alone update the corresponding parameters.

1 Like

Yes. Thatā€™s why you have suggested values.

1 Like

Is

learn.fit_one_cycle(3, lr=1e-3)
learn.fit_one_cycle(7, lr=1e-3)

same as

learn.fit_one_cycle(10, lr=1e-3)?

Absolutely not, that is what Jeremy is explaining right now.

3 Likes

Is there something special that needs to be done to create a new random learner? I feel like Iā€™ve had troubles before when running and training a learner and then going back to re-defining a new learner and re-fitting. I donā€™t have any current examples where Iā€™m having issues though so itā€™s possible I was just not doing quite what I thought I was.

If youā€™ll train for 50 or 100 epochs will your model generalize better on a brand new data it has not seen before? In other words, youā€™ll have a very long tail, but the model will still be improving?