Lesson 1 In-Class Discussion ✅

Can you direct to me paper which mentions this? I have been banging my head over this for quite some time. Again, what i mean to say is that it not necessary for a cnn to retain the structure of the whole face exactly.

My Ref --> https://medium.com/ai³-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

I’m wondering if the 1.00 is the prediction percentage for the incorrect class? So the model is saying it is 100% sure, but it it actually incorrect.

You’ll find that overfitting isn’t necessarily bad as long as your validation metric is still improving, especially in practical applications.

4 Likes

Should the slice(1e-6, 1e-4) be log-spaced instead of linear spaced?

8 Likes

But that doesn’t feel as fair comparison of learning rates because model weights change

Maybe but then it’s not what Jeremy said so we should clear that up :slight_smile:

1 Like

if we reduce the bs then ,it might need a different LR ?? as now we may be updating weights quite frequently

2 Likes

what about bounding boxes labeled data?

1 Like

It is a probability in the mathematical sense: you model will send thing that add up to one. But since it has been trained to favor one category above all others, it’ll have a tendency to return 1. for a category and 0. for the others. So you shouldn’t interpret that number as a sign of confidence from your model. It’s in that sense I say it isn’t a category.

2 Likes

You will be waiting for many hours for the training to complete if you use cpu… use gpu if you value your time :slight_smile:

Yes, I will.

Appreciate all the helper functions for the DataBunch object! As someone who entered Deep Learning from a completely different field, I can say with certainty that the most difficult part of implementing a new architecture or on a new dataset, is bringing the data in.

8 Likes

The documentation is super cool!

9 Likes

If i understand what you’re saying correctly the model is using something like softmax so it heavily favors the class it thinks is the right one over the other, so the number is likely to be close to 1. But my question was rather on the fact that that number should be low, as it’s the number representing what the model thought of the actual class (that it got wrong). What am I missing here ?

1 Like

What is the reason why the conv learner does not learn well if unfrozen? because all rates are the same? Why is this a problem exactly? Isn’t this how many models are trained?

Tabural text data will probably be OKish on CPU, image/audio/video - absolutely not.

@sgugger
we didn’t specify the learning rate in the fit_one_cycle() method. Does it identify the LR by itself?

1 Like

Ah sorry I misunderstood your first question!

it uses a default

1 Like

When you don’t specify one, it uses the default of 3e-3.

4 Likes