Lesson 6 - Official topic

Note: This is a wiki post - feel free to edit to add links from the lesson or other useful info.

Resources

Links from lesson

  • Python for Data Analysis - Wes McKinney - link

Other useful links

  • BCE vs BCEWithLogitLoss - Pytorch Forum Discussion - link
  • Mixed Precision Training Comparison - link
  • Notes by @Lankinen
16 Likes

Both sound great!

3 Likes

As we approach the end of part 1 are there any plans for a part 2? If so, any idea when?

16 Likes

Follow-up on this – any thoughts on what you’d like to cover in the next Part 2?

3 Likes

I will make a note about the part 2 question, and ask Jeremy at the break (since it is out of flow with the topic now, but I know people are wondering)

24 Likes

Also there might be wrong tagging by experts in the dataset which would have caused the model to get confused…

2 Likes

Extra Read: Here’s a Link to my interview with Leslie Smith (Author of Cyclical LR work).

7 Likes

Is the learning rate plot in lr_find plotted against one single mini-batch?

2 Likes

Why don’t we need minimum of the learning rate?

During the lr_find() method, every learning rate is applied to a different batch, right? Is the network reset to the initial status after each trial?

No, we take steps on 100 different mini-batches, not just one, increasing the learning rate at each one.

8 Likes

Here is the link to the paper Jeremy refers to.

Cyclical Learning Rates for Training Neural Networks

2 Likes

+1 to this… is it the same minibatch everytime, or a different one? Are the weights updated each time?

There are other implementations of LR finder, including PyTorch Lightning and some community-written callbacks for Keras (ex: here)

1 Like

Why would an “ideal” learning rate found with a single mini-batch at the start of training keep being a good learning rate even after several epochs and further loss reductions? Wouldn’t the ideal learning rate be a local property of the loss function?

5 Likes

For LR finder, why use the steepest and not the min?

1 Like

No, since it’s not really training while the learning rate is too small, you don’t need that.

1 Like

When should we run the learning rate finder? Only at the beginning or should we update after a couple of epoch?

1 Like

There is a lr_min, lr_steep in the code which selects the minimum, maximum lr rate. Part of the lr_find function