Awesome. Thank you!
Actually, I made a silly error
batch = dls.one_batch()
len(batch) # ==> 4
x = batch[0]
# ys = batch[1] # len ==> 1 (ofc!)
ys = batch[1:] # len ==> 3
On further thinking, this shouldn’t be done, because your model will have one parameter in the final layer that will be making predictions for the ignore_index
element.
I think tweaking Categorize
to support ignore_index
is the correct approach.
Could somebody please remember me how to add the test dataset to the dataloader after the training?, i.e. my training data has been splitted into training and validation (originally in the same folder 80/20) and now I would like to know how well my model works in the test set which is in another folder i.e. get predictions and metrics.
test_dl = learn.dls.test_dl(items)
learn.validate(dl=test_dl)
@muellerzr if you have time can you please look into this? Ignore the code inside show_results
, once it gets called I can work on that. Typedispatch system is what causing this error.
I’m not sure there. Sylvain would be more help, I haven’t faced that error before
I am building a simple cat vs dog classifier BUT using multi-cat example so that if the picture is not a dog or a cat I get not output. This is my learner and its threshold:
learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[partial(accuracy_multi, thresh=0.9)]).to_fp16()
I trained the model and trained with a random pic. The prediction is NOT above 0.9 but stills says ‘Dog’… what am I missing in here?
Makes no sense to me
That’s because that’s done on the loss function, not the metric So you need to adjust the threshold of the BCELossLogits
loss function (I had this same question myself last week or so)
Can you explain a bit more? My understanding is that the loss function is what we are trying to minimize, the metric just a proxy for how good our model is working. So I would have said it belongs to the metric, not to the loss…
I re-trained the model again… how can I prevent it about being so confident about the wrong class? I find this really concerning
I feel doing
learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[partial(accuracy_multi, thresh=0.95)], loss_func=BCEWithLogitsLossFlat(thresh=0.95)).to_fp16()
is dangerous in the sense that the model feels incentivized to do very sure predictions to minimize the loss during the training. I would like it during prediction only
Yes, so predict
, when it works, focuses on the loss_func
's decodes
and activation
in order to determine what category it is. For example, if you had no metric whatsoever, predict
would still work exactly the same, and this is the reason why. Metrics != inference, as a metric still needs a y
to work, doesn’t it?
So when we define our loss (ourselves, not by fastai) as such (for exactly where see here:
loss_func = BCEWithLogitsLossFlat(thresh=0.5)
0.5 is the default. So we’d want to adjust this threshold to what we want to see. Also ideally you’d make this and the metric’s threshold to be exactly the same.
Why is this important? Remember that decodes
and activation
I brought up? When we do learn.predict
, we take the raw output of our model, pass it through the activation
of that function and then the decodes
of that function as well. For BCE
, they look like so:
def decodes(self, x): return x>self.thresh
def activation(self, x): return torch.sigmoid(x)
So we have our sigmoid activation followed by our threshold to scale on
Then adjust learn.loss_func.thresh
to be what you’d like, that’s all that’s needed
Also, to expand on that predict I brought up, activation
is always called, decodes
is called when with_decoded=True
This help clear up some confusion?
Ah, thanks a lot! I am missing here one key part. When I look at the code of BCE (or the snipped you pasted) what are exactly the decodes and activations? I understand that we need a number between 0 and 1 (sigmoid takes care of this) and that we are only take it as valid if > thesh
but in the code I do not see how the 2 play together, it seems to me both return independently
I tried checking at the learn.predict?? code but I do not see how to access the gets_preds function inside it
So let’s walk through predict
, specifically where it calls get_preds
:
inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
We see here we get back our input passed in, and with_decoded
. This decoded is for the loss function only. decode_batch
then decodes from our DataBlock
.
For a different way to look at it, take my fastinference
library I’ve been building. I rebuilt the get_preds
function to make it a bit more efficient, but for all intensive purposes it still acts and behaves the same way the framework does:
for batch in dl:
with torch.no_grad():
...
if decoded_loss or fully_decoded:
out = x.model(*batch[:x.dls.n_inp])
raw.append(out)
dec_out.append(x.loss_func.decodes(out))
else:
raw.append(x.model(*batch[:x.dls.n_inp]))
This is how I get predictions (this is all hidden inside get_preds
and the GatherPreds
callback, so it’s hard to figure out. Presume it’s this with a bit more abstractness)
So we can see that if I want to decode my loss, I decode via the loss_func
. At the end I’ll go through and get a result similar to predict here:
if not raw_outs:
try: outs.insert(0, x.loss_func.activation(tensor(raw)).numpy())
except: outs.insert(0, dec_out)
else:
outs.insert(0, raw)
if fully_decoded: outs = _fully_decode(x.dls, inps, outs, dec_out, is_multi)
if decoded_loss: outs = _decode_loss(x.dls.vocab, dec_out, outs)
return outs
(And I’m going to do a video walkthrough of this whole thing this weekend too, that may help some.)
That’s a long explanation but does this help? To see _fully_decode
and _decode_loss
, see here, I’m going to go through those in the video, just a lot to explain here (but it’s good to keep in mind because the fastai framework operates in this same way!)
Also, my version has an option return a fully decoded output similar to learn.predict
, that’s why it will look different.
That’s fantastic!! Pitty I can only give it a single Are you having any more videos sessions? Any fastai related study group link. I was quite busy over the past month (job change) but I would like to get fully back to it!!
On another note, for the ones reading this thread. This is how I solved the problem.
learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[partial(accuracy_multi, thresh=0.95)], loss_func=BCEWithLogitsLossFlat(thresh=0.5)).to_fp16()
learn.loss_func = BCEWithLogitsLossFlat(thresh=0.95)
learn.predict('real-ai.jpg')
So basically, I am using for the metric the threshold I will use during evaluation to have an idea of how the model would behave. However, during training I use the 0.5 threshold to not incentivate extreme predictions. Hope this makes sense! Learning myself still.
Yes and no, they’re unrelated to the course, and generally just YouTube uploads. I may make the next one live if people are interested, this weekend I’ll be doing a video on my fastinference
library (where all that was taken from), so if people are interested in that being live let me know
I would be interested in both, really like your teaching style I just need to find a better way not to miss out on them!
Coming back to the question about how to add the test dataset in the test dataloader… that’s what I got so far. I tried several things but something is wrong
IIRC test_dl
now is an is_labelled
parameter. let me go see real quick
Edit: @mgloria yes, it has an with_labels
parameter now. Pass True
:
test_dl = data.test_dl(fnames, with_labels=True)