Note: This is a wiki post - feel free to edit to add links from the lesson or other useful info.
Resources
Links from lesson
Python for Data Analysis - Wes McKinney - link
Other useful links
BCE vs BCEWithLogitLoss - Pytorch Forum Discussion - link
Mixed Precision Training Comparison - link
Notes by @Lankinen
16 Likes
As we approach the end of part 1 are there any plans for a part 2? If so, any idea when?
16 Likes
dcooper01
(Daniel Cooper)
April 22, 2020, 1:38am
11
Follow-up on this – any thoughts on what you’d like to cover in the next Part 2?
3 Likes
rachel
(Rachel Thomas)
April 22, 2020, 1:39am
13
I will make a note about the part 2 question, and ask Jeremy at the break (since it is out of flow with the topic now, but I know people are wondering)
24 Likes
nareshr8
(Naresh)
April 22, 2020, 1:41am
14
Also there might be wrong tagging by experts in the dataset which would have caused the model to get confused…
2 Likes
init_27
(Sanyam Bhutani)
April 22, 2020, 1:46am
15
Extra Read: Here’s a Link to my interview with Leslie Smith (Author of Cyclical LR work).
7 Likes
harish3110
(Harish Vadlamani)
April 22, 2020, 1:47am
16
Is the learning rate plot in lr_find plotted against one single mini-batch?
2 Likes
ayansane
(Alfa Yansane)
April 22, 2020, 1:47am
17
Why don’t we need minimum of the learning rate?
giacomov
(Giacomo Vianello)
April 22, 2020, 1:47am
18
During the lr_find() method, every learning rate is applied to a different batch, right? Is the network reset to the initial status after each trial?
No, we take steps on 100 different mini-batches, not just one, increasing the learning rate at each one.
8 Likes
ram_cse
(Ram Kripal)
April 22, 2020, 1:48am
20
2 Likes
+1 to this… is it the same minibatch everytime, or a different one? Are the weights updated each time?
There are other implementations of LR finder, including PyTorch Lightning and some community-written callbacks for Keras (ex: here )
1 Like
ErickMFS
(Erick Muzart Fonseca)
April 22, 2020, 1:48am
23
Why would an “ideal” learning rate found with a single mini-batch at the start of training keep being a good learning rate even after several epochs and further loss reductions? Wouldn’t the ideal learning rate be a local property of the loss function?
5 Likes
For LR finder, why use the steepest and not the min?
1 Like
No, since it’s not really training while the learning rate is too small, you don’t need that.
1 Like
yfrancois
(Yann François)
April 22, 2020, 1:48am
26
When should we run the learning rate finder? Only at the beginning or should we update after a couple of epoch?
1 Like
There is a lr_min, lr_steep in the code which selects the minimum, maximum lr rate. Part of the lr_find function