Lesson 5 In-Class Discussion ✅

sgugger · November 20, 2018, 4:47am

The max value is the learning rate you pass. The starting learning rate is the max learning rate divided by 25 (the default that you can change).

sgugger · November 20, 2018, 4:48am

Look at the absciss, do you see one epoch?
It’s the whole training yes.

miwojc · November 20, 2018, 4:49am

Thank you, and the end point value seems to be even lower than start point value?

sgugger · November 20, 2018, 4:49am

The end value is 0.

devforfu · November 20, 2018, 4:49am

So, even if we have, like, 10 training epochs, we have a single cycle of learning rate changes, right? (That’s where the name comes from, I guess).

sgugger · November 20, 2018, 4:50am

Exactly.

miwojc · November 20, 2018, 4:51am

what should be ‘default’ number of epochs in one cycle? 4/5 as per training notebooks?

sgugger · November 20, 2018, 4:53am

We don’t have a good answer to that question sadly. It depends, and that’s one of the tings you have to figure out in the situation you’re in. Jeremy gave some rules of thumbs as to whether you’re training for too long or not in previous lessons.

PierreO · November 20, 2018, 4:54am

Can we see the number of epochs as a kind of regularization parameter, or if a model is well-regularized you should be able to train as much as you want ?

wdhorton · November 20, 2018, 4:54am

It’s going to depend on your dataset. In this Kaggle solution from fastai v1 users, they used 4 epochs frozen and then 32 epochs unfrozen https://www.kaggle.com/c/airbus-ship-detection/discussion/71664

AlexisGallagher · November 20, 2018, 4:54am

Is the entire purpose of softmax to enable cross entropy loss? Or does it have other uses?

sgugger · November 20, 2018, 4:55am

Not really. You will always overfit if you train forever (unless you have more data but in this case you’re not doing more epochs).

ramanan · November 20, 2018, 4:55am

When you have collected lots of new data is it better to train from scratch or start with the existing weights from an old model (same architecture)? Would the old weights help or harm?

sgugger · November 20, 2018, 4:55am

Its purpose is to give you probabilities that add up to 1., with one bigger than the other.

miwojc · November 20, 2018, 4:55am

so when you trained for say 5 epochs and see that could train more. do you start from scratch or just add another couple of epochs? if add do you run learning rate finder first?

soco_loco · November 20, 2018, 4:56am

isn’t it doing some normalization too?

sgugger · November 20, 2018, 4:56am

Try both and see which one gives you the best result.

sgugger · November 20, 2018, 4:56am

No softmax is just there to give you probabilities.

lesscomfortable · November 20, 2018, 4:56am

If the data is remotely close it helps. Think about it as being ‘better than random’. If it is, it helps.

jcatanza · November 20, 2018, 4:56am

Softmax predicts estimates the probability p_{j} of a training data point x to be in class C_{j} when multiple class labels \{C_{1}, C_{2}, C_{j}, ..., C_{M}\} are allowed, and M is the number of classes.