Getting filenames and corresponding predictions

I have been trying to construct a list of pairs of filenames and the corresponding ground truths. Here’s how I tried to do this:

train_preds, train_ground_truths = learn.get_preds(ds_type=DatasetType.Train)
train_paths = list(map(lambda path: str(path), learn.data.valid_ds.items))
train_fnames = list(map(lambda path: path.split('/')[:-1], train_paths))
train_fnames_and_ground_truths = list(zip(train_fnames, train_ground_truths))

However, I did a few manual checks and it seems that the two arrays I am zipping together here are completely misaligned… How can I do this?

There’s a similar thread here, Get the filenames of the data in the Test set in the order they're predicted, but there’s no definite, practical answer there either.

Hmm, I guess someone needs to submit a pull request to update fast.ai docs+library to make this easier but check out my previous posts: How to review images for bad classifications + How do I get my list of predictions match the order of the images in my test folder?

It’s not a copy&paste solution but I think these posts should provide you with all the ingredients. I guess the main point for you is that you the need the ds_idx instead of of just iterating of .items.

1 Like

Go look at my new PR :slight_smile: Here is how I extrapolated:

        for j, idx in enumerate(self.tl_idx):
            da, cl = self.interp.data.dl(self.interp.ds_type).dataset[idx]

                img, lbl = self.interp.data.valid_ds[idx]
                fn = self.interp.data.valid_ds.x.items[idx]
                fn = re.search('([^/*]+)_\d+.*$', str(fn)).group(0)
                img.show(ax=ax[row, col])
                ax[row,col].set_title(fn)
                x += 1
2 Likes

Thank you, @haverstind and @muellerzr, I’ll check out both

Could you share a link to your PR? I couldn’t find it

@haverstind, I checked both of the linked posts, but it seems to me that there what you did was filtering a subset of the predictions so that’s why you needed to keep track of the indices. If no filtering is done in your loops, then i == ds_idx for all entries, and so it’s not really doing anything, I just tried to follow those snippets, but I am still getting inconsistent columns:

image

Could you please spell out what you had in mind, maybe I misunderstood?

That’s the pr. The code I showed is how I got the filenames associated with when two classes matched. You should be able to extrapolate it from there. Source code link should be there too for my github as well.

in the code, fn is file name. You can see where it lives in items.

I’m using 1.0.44 os I don’t have the class that you are using there. But surely it can’t be that hard zipping together preds and fnames? This such a frustrating thing it should be a no brainer… :confused:

I show it with img and fname, give me a few minutes and I’ll show a better example

This isn’t quite “zipping” as I don’t know how to do that in python yet, but the source code for that function is here: here

Look under _plot_imgs.

Though it may not be the best choice looking back as we assume losses. Is your data labeled? Or are we doing this on test sets

Thanks @muellerzr.
This is what I was looking for.