Getting filenames and corresponding predictions

mkd · July 19, 2019, 8:18pm

I have been trying to construct a list of pairs of filenames and the corresponding ground truths. Here’s how I tried to do this:

train_preds, train_ground_truths = learn.get_preds(ds_type=DatasetType.Train)
train_paths = list(map(lambda path: str(path), learn.data.valid_ds.items))
train_fnames = list(map(lambda path: path.split('/')[:-1], train_paths))
train_fnames_and_ground_truths = list(zip(train_fnames, train_ground_truths))

However, I did a few manual checks and it seems that the two arrays I am zipping together here are completely misaligned… How can I do this?

There’s a similar thread here, Get the filenames of the data in the Test set in the order they're predicted, but there’s no definite, practical answer there either.

haverstind · July 19, 2019, 8:32pm

Hmm, I guess someone needs to submit a pull request to update fast.ai docs+library to make this easier but check out my previous posts: How to review images for bad classifications + How do I get my list of predictions match the order of the images in my test folder?

It’s not a copy&paste solution but I think these posts should provide you with all the ingredients. I guess the main point for you is that you the need the ds_idx instead of of just iterating of .items.

muellerzr · July 19, 2019, 9:09pm

Go look at my new PR Here is how I extrapolated:

        for j, idx in enumerate(self.tl_idx):
            da, cl = self.interp.data.dl(self.interp.ds_type).dataset[idx]

                img, lbl = self.interp.data.valid_ds[idx]
                fn = self.interp.data.valid_ds.x.items[idx]
                fn = re.search('([^/*]+)_\d+.*$', str(fn)).group(0)
                img.show(ax=ax[row, col])
                ax[row,col].set_title(fn)
                x += 1

mkd · July 19, 2019, 9:25pm

Thank you, @haverstind and @muellerzr, I’ll check out both

mkd · July 23, 2019, 6:08pm

Could you share a link to your PR? I couldn’t find it

mkd · July 23, 2019, 6:13pm

@haverstind, I checked both of the linked posts, but it seems to me that there what you did was filtering a subset of the predictions so that’s why you needed to keep track of the indices. If no filtering is done in your loops, then i == ds_idx for all entries, and so it’s not really doing anything, I just tried to follow those snippets, but I am still getting inconsistent columns:

Could you please spell out what you had in mind, maybe I misunderstood?

muellerzr · July 23, 2019, 6:15pm

That’s the pr. The code I showed is how I got the filenames associated with when two classes matched. You should be able to extrapolate it from there. Source code link should be there too for my github as well.

muellerzr · July 23, 2019, 6:19pm

in the code, fn is file name. You can see where it lives in items.

mkd · July 24, 2019, 6:01pm

I’m using 1.0.44 os I don’t have the class that you are using there. But surely it can’t be that hard zipping together preds and fnames? This such a frustrating thing it should be a no brainer…

muellerzr · July 24, 2019, 6:05pm

I show it with img and fname, give me a few minutes and I’ll show a better example

muellerzr · July 24, 2019, 6:20pm

This isn’t quite “zipping” as I don’t know how to do that in python yet, but the source code for that function is here: here

Look under _plot_imgs.

Though it may not be the best choice looking back as we assume losses. Is your data labeled? Or are we doing this on test sets

Abhijeet01 · January 10, 2020, 2:03pm

Thanks @muellerzr.
This is what I was looking for.