Get_preds: which test file corresponds with which row in the result tensor?

So, this:
result=learn.get_preds(ds_type=DatasetType.Test)
runs fine on my test set

result[0] is a tensor of shape (79726,10) which is perfect because the test directory had 79726 files and there are 10 outputs in the network

but I have no idea, for any particular row out of the 79726, which test file does that result row correspond to? in what order is it iterating over the files?

I tried this:
testdat = ImageClassificationDataset.from_single_folder('/home/nbuser/hw/driving/test',classes=data.classes)

and then

testdat.x[0] evals to PosixPath('/home/nbuser/hw/driving/test/img_52992.jpg')

This gives me a plausible way to match testdat.x[i] with result[0][i] but I’m not 100% confident that this is exactly what get_preds was doing…

help!

Yes, they should be aligned. I mean, result[i] is for testdat.x[0]. To verify, you can plot your output and prediction to check if the result makes sense:

img = open_image(testdat.x[0])
predicted = learn.data.classes[result[0].argmax()]
... # plot image and its title for sanity check

I understand that is an option that would support the hypothesis that the order is the same.

But it would be great to have 100% assurance from an author of the code, or someone better than me at digging into the source, that this is indeed guaranteed behavior and invariant across batch sizes, random seeds, and other possible variables.

I’m also looking for a solid response on this, It seems in fact that the order of images is not actually the order of predictions at least in my case:

I’m not seeing anywhere in the source containing the filenames that could be plotted against the predictions in proper order.

1 Like