Model has excellent perf on valid, terrible perf after scoring on same valid data loaded again

[Solved, see below]
I’ll show you in two screenshots the problem. The full notebook here

The model trains with 99% accuracy, yay!

Now we load the same data to the pieces2 Dataloader in the same way. We score the model and it’s bad. Why?

Yes, it’s the exact same data:

Good news, I’m able to get the same accuracy (~99%) when I manually* calculate for each item. Why this isn’t reflected in ClassificationInterpretation remains to be seen.
*:

Resolved! (although not understood)

The trick is to build the input to ClassificationInterpretation with learn.dls.test_dl(items, with_labels=True) instead of as it’s own dataloader. See below:

According to the documentation, using "validation transforms of dls". But I’m not able to tell where/why or how these transforms are different :woman_shrugging: Full notebook here

1 Like

Certain augmentations (such as warping etc) are only done on the train whereas the Resizing and normalization are done on both sets.

1 Like

Starting to make sense. Do you know which properties of the TfmDl’s would show the diff?
I’ve checked .after_item and .tfms of the respective dls and they look the same. (All attr’s look the same that I’ve checked.)

Also I thought I account for the train/valid diff in my experiments by using pieces2.valid as the targets, not pieces2.train

It’s properties of the transforms themselves. Each have a split_idx which if it’s 0 it’ll be on all of them, else it’ll be on the training set only IIRC

Turns out it was lacking Normalize in the .after_batch. From what I understand, this Normalization step also needs* parameters that get fit in train…and I was able to extract some from the learner.
*: not sure if it “needs” them, but it doesn’t work for me without them

2 Likes

If you don’t pass in statistics for Normalize it’ll use the first batch of data you use. Now one more thing is you used cnn_learner. Normalize was being used actually since you forgot it. And it was using imagenet_stats for the actual data statistics. So if you’re wanting to use that you should add a call to Normalize in your batch transforms to either imagenet_stats or your own data (but if you’re using a pre-trained model you should use imagenet_stats), as I think the Normalize that was there was only used during training (But I am not 100% sure here, just going off of you not finding it in after_batch, also the fact it’s in training and not validation is probably a bug)

1 Like

is probably a bug
Great, more stuff to do in the morning :sunny: