Lesson 5 In-Class Discussion ✅

The max value is the learning rate you pass. The starting learning rate is the max learning rate divided by 25 (the default that you can change).

2 Likes

Look at the absciss, do you see one epoch? :wink:
It’s the whole training yes.

2 Likes

Thank you, and the end point value seems to be even lower than start point value?

The end value is 0.

2 Likes

So, even if we have, like, 10 training epochs, we have a single cycle of learning rate changes, right? (That’s where the name comes from, I guess).

1 Like

Exactly.

3 Likes

what should be ‘default’ number of epochs in one cycle? 4/5 as per training notebooks?

We don’t have a good answer to that question sadly. It depends, and that’s one of the tings you have to figure out in the situation you’re in. Jeremy gave some rules of thumbs as to whether you’re training for too long or not in previous lessons.

2 Likes

Can we see the number of epochs as a kind of regularization parameter, or if a model is well-regularized you should be able to train as much as you want ?

It’s going to depend on your dataset. In this Kaggle solution from fastai v1 users, they used 4 epochs frozen and then 32 epochs unfrozen https://www.kaggle.com/c/airbus-ship-detection/discussion/71664

1 Like

Is the entire purpose of softmax to enable cross entropy loss? Or does it have other uses?

2 Likes

Not really. You will always overfit if you train forever (unless you have more data but in this case you’re not doing more epochs).

1 Like

When you have collected lots of new data is it better to train from scratch or start with the existing weights from an old model (same architecture)? Would the old weights help or harm?

Its purpose is to give you probabilities that add up to 1., with one bigger than the other.

so when you trained for say 5 epochs and see that could train more. do you start from scratch or just add another couple of epochs? if add do you run learning rate finder first?

1 Like

isn’t it doing some normalization too?

Try both and see which one gives you the best result.

2 Likes

No softmax is just there to give you probabilities.

If the data is remotely close it helps. Think about it as being ‘better than random’. If it is, it helps.

1 Like

Softmax predicts estimates the probability p_{j} of a training data point x to be in class C_{j} when multiple class labels \{C_{1}, C_{2}, C_{j}, ..., C_{M}\} are allowed, and M is the number of classes.