Lesson 6 - Official topic

sgugger · April 21, 2020, 11:01pm

Note: This is a wiki post - feel free to edit to add links from the lesson or other useful info.

Resources

Video (private)
fastbook chapter 5
fastbook chapter 5 questionnaire solutions - feel free to contribute!
fastbook chapter 6
fastbook chapter 6 questionnaire solutions - feel free to contribute!
fastbook chapter 8
fastbook chapter 8 questionnaire solutions - feel free to contribute!
Non-beginner discussion

Links from lesson

Python for Data Analysis - Wes McKinney - link

Other useful links

BCE vs BCEWithLogitLoss - Pytorch Forum Discussion - link
Mixed Precision Training Comparison - link
Notes by @Lankinen

hiromi · April 22, 2020, 1:27am

Both sound great!

Raymond-Wu · April 22, 2020, 1:34am

As we approach the end of part 1 are there any plans for a part 2? If so, any idea when?

dcooper01 · April 22, 2020, 1:38am

Follow-up on this – any thoughts on what you’d like to cover in the next Part 2?

rachel · April 22, 2020, 1:39am

I will make a note about the part 2 question, and ask Jeremy at the break (since it is out of flow with the topic now, but I know people are wondering)

nareshr8 · April 22, 2020, 1:41am

Also there might be wrong tagging by experts in the dataset which would have caused the model to get confused…

init_27 · April 22, 2020, 1:46am

Extra Read: Here’s a Link to my interview with Leslie Smith (Author of Cyclical LR work).

harish3110 · April 22, 2020, 1:47am

Is the learning rate plot in lr_find plotted against one single mini-batch?

ayansane · April 22, 2020, 1:47am

Why don’t we need minimum of the learning rate?

giacomov · April 22, 2020, 1:47am

During the lr_find() method, every learning rate is applied to a different batch, right? Is the network reset to the initial status after each trial?

sgugger · April 22, 2020, 1:47am

No, we take steps on 100 different mini-batches, not just one, increasing the learning rate at each one.

ram_cse · April 22, 2020, 1:48am

Here is the link to the paper Jeremy refers to.

Cyclical Learning Rates for Training Neural Networks

Nonnormalizable · April 22, 2020, 1:48am

+1 to this… is it the same minibatch everytime, or a different one? Are the weights updated each time?

ilovescience · April 22, 2020, 1:48am

There are other implementations of LR finder, including PyTorch Lightning and some community-written callbacks for Keras (ex: here)

ErickMFS · April 22, 2020, 1:48am

Why would an “ideal” learning rate found with a single mini-batch at the start of training keep being a good learning rate even after several epochs and further loss reductions? Wouldn’t the ideal learning rate be a local property of the loss function?

supercurious · April 22, 2020, 1:48am

For LR finder, why use the steepest and not the min?

sgugger · April 22, 2020, 1:48am

No, since it’s not really training while the learning rate is too small, you don’t need that.

yfrancois · April 22, 2020, 1:48am

When should we run the learning rate finder? Only at the beginning or should we update after a couple of epoch?

Raymond-Wu · April 22, 2020, 1:48am

There is a lr_min, lr_steep in the code which selects the minimum, maximum lr rate. Part of the lr_find function