Model has excellent perf on valid, terrible perf after scoring on same valid data loaded again

[Solved, see below]
I’ll show you in two screenshots the problem. The full notebook here

The model trains with 99% accuracy, yay!

Now we load the same data to the pieces2 Dataloader in the same way. We score the model and it’s bad. Why?

Yes, it’s the exact same data:

Good news, I’m able to get the same accuracy (~99%) when I manually* calculate for each item. Why this isn’t reflected in ClassificationInterpretation remains to be seen.

Resolved! (although not understood)

The trick is to build the input to ClassificationInterpretation with learn.dls.test_dl(items, with_labels=True) instead of as it’s own dataloader. See below:

According to the documentation, using "validation transforms of dls". But I’m not able to tell where/why or how these transforms are different :woman_shrugging: Full notebook here

1 Like

Certain augmentations (such as warping etc) are only done on the train whereas the Resizing and normalization are done on both sets.

1 Like

Starting to make sense. Do you know which properties of the TfmDl’s would show the diff?
I’ve checked .after_item and .tfms of the respective dls and they look the same. (All attr’s look the same that I’ve checked.)

Also I thought I account for the train/valid diff in my experiments by using pieces2.valid as the targets, not pieces2.train

It’s properties of the transforms themselves. Each have a split_idx which if it’s 0 it’ll be on all of them, else it’ll be on the training set only IIRC

Turns out it was lacking Normalize in the .after_batch. From what I understand, this Normalization step also needs* parameters that get fit in train…and I was able to extract some from the learner.
*: not sure if it “needs” them, but it doesn’t work for me without them


If you don’t pass in statistics for Normalize it’ll use the first batch of data you use. Now one more thing is you used cnn_learner. Normalize was being used actually since you forgot it. And it was using imagenet_stats for the actual data statistics. So if you’re wanting to use that you should add a call to Normalize in your batch transforms to either imagenet_stats or your own data (but if you’re using a pre-trained model you should use imagenet_stats), as I think the Normalize that was there was only used during training (But I am not 100% sure here, just going off of you not finding it in after_batch, also the fact it’s in training and not validation is probably a bug)

1 Like

is probably a bug
Great, more stuff to do in the morning :sunny: