Lesson 1 In-Class Discussion ✅

SHAR1 · October 23, 2018, 3:53am

Can you direct to me paper which mentions this? I have been banging my head over this for quite some time. Again, what i mean to say is that it not necessary for a cnn to retain the structure of the whole face exactly.

My Ref --> https://medium.com/ai³-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

KevinB · October 23, 2018, 3:54am

I’m wondering if the 1.00 is the prediction percentage for the incorrect class? So the model is saying it is 100% sure, but it it actually incorrect.

wdhorton · October 23, 2018, 3:54am

You’ll find that overfitting isn’t necessarily bad as long as your validation metric is still improving, especially in practical applications.

tschoy · October 23, 2018, 3:54am

Should the slice(1e-6, 1e-4) be log-spaced instead of linear spaced?

vova · October 23, 2018, 3:54am

But that doesn’t feel as fair comparison of learning rates because model weights change

PierreO · October 23, 2018, 3:54am

Maybe but then it’s not what Jeremy said so we should clear that up

champs.jaideep · October 23, 2018, 3:54am

if we reduce the bs then ,it might need a different LR ?? as now we may be updating weights quite frequently

shreyasjag · October 23, 2018, 3:55am

what about bounding boxes labeled data?

sgugger · October 23, 2018, 3:55am

It is a probability in the mathematical sense: you model will send thing that add up to one. But since it has been trained to favor one category above all others, it’ll have a tendency to return 1. for a category and 0. for the others. So you shouldn’t interpret that number as a sign of confidence from your model. It’s in that sense I say it isn’t a category.

brtknr · October 23, 2018, 3:56am

You will be waiting for many hours for the training to complete if you use cpu… use gpu if you value your time

rachel · October 23, 2018, 3:57am

Yes, I will.

Gabriel_Syme · October 23, 2018, 3:57am

Appreciate all the helper functions for the DataBunch object! As someone who entered Deep Learning from a completely different field, I can say with certainty that the most difficult part of implementing a new architecture or on a new dataset, is bringing the data in.

paul · October 23, 2018, 3:59am

The documentation is super cool!

PierreO · October 23, 2018, 3:59am

If i understand what you’re saying correctly the model is using something like softmax so it heavily favors the class it thinks is the right one over the other, so the number is likely to be close to 1. But my question was rather on the fact that that number should be low, as it’s the number representing what the model thought of the actual class (that it got wrong). What am I missing here ?

bluesky314 · October 23, 2018, 3:59am

What is the reason why the conv learner does not learn well if unfrozen? because all rates are the same? Why is this a problem exactly? Isn’t this how many models are trained?

stas · October 23, 2018, 4:00am

Tabural text data will probably be OKish on CPU, image/audio/video - absolutely not.

asutosh97 · October 23, 2018, 4:00am

@sgugger
we didn’t specify the learning rate in the fit_one_cycle() method. Does it identify the LR by itself?

sgugger · October 23, 2018, 4:00am

Ah sorry I misunderstood your first question!

Interogativ · October 23, 2018, 4:00am

it uses a default

sgugger · October 23, 2018, 4:01am

When you don’t specify one, it uses the default of 3e-3.