Can you direct to me paper which mentions this? I have been banging my head over this for quite some time. Again, what i mean to say is that it not necessary for a cnn to retain the structure of the whole face exactly.
I’m wondering if the 1.00 is the prediction percentage for the incorrect class? So the model is saying it is 100% sure, but it it actually incorrect.
You’ll find that overfitting isn’t necessarily bad as long as your validation metric is still improving, especially in practical applications.
Should the slice(1e-6, 1e-4)
be log-spaced instead of linear spaced?
But that doesn’t feel as fair comparison of learning rates because model weights change
Maybe but then it’s not what Jeremy said so we should clear that up
if we reduce the bs then ,it might need a different LR ?? as now we may be updating weights quite frequently
what about bounding boxes labeled data?
It is a probability in the mathematical sense: you model will send thing that add up to one. But since it has been trained to favor one category above all others, it’ll have a tendency to return 1. for a category and 0. for the others. So you shouldn’t interpret that number as a sign of confidence from your model. It’s in that sense I say it isn’t a category.
You will be waiting for many hours for the training to complete if you use cpu… use gpu if you value your time
Yes, I will.
Appreciate all the helper functions for the DataBunch object! As someone who entered Deep Learning from a completely different field, I can say with certainty that the most difficult part of implementing a new architecture or on a new dataset, is bringing the data in.
The documentation is super cool!
If i understand what you’re saying correctly the model is using something like softmax so it heavily favors the class it thinks is the right one over the other, so the number is likely to be close to 1. But my question was rather on the fact that that number should be low, as it’s the number representing what the model thought of the actual class (that it got wrong). What am I missing here ?
What is the reason why the conv learner does not learn well if unfrozen? because all rates are the same? Why is this a problem exactly? Isn’t this how many models are trained?
Tabural text data will probably be OKish on CPU, image/audio/video - absolutely not.
@sgugger
we didn’t specify the learning rate in the fit_one_cycle()
method. Does it identify the LR by itself?
Ah sorry I misunderstood your first question!
it uses a default
When you don’t specify one, it uses the default of 3e-3.