Getting filenames and corresponding predictions

(Mike) #1

I have been trying to construct a list of pairs of filenames and the corresponding ground truths. Here’s how I tried to do this:

train_preds, train_ground_truths = learn.get_preds(ds_type=DatasetType.Train)
train_paths = list(map(lambda path: str(path), learn.data.valid_ds.items))
train_fnames = list(map(lambda path: path.split('/')[:-1], train_paths))
train_fnames_and_ground_truths = list(zip(train_fnames, train_ground_truths))

However, I did a few manual checks and it seems that the two arrays I am zipping together here are completely misaligned… How can I do this?

There’s a similar thread here, Get the filenames of the data in the Test set in the order they're predicted, but there’s no definite, practical answer there either.

0 Likes

#2

Hmm, I guess someone needs to submit a pull request to update fast.ai docs+library to make this easier but check out my previous posts: How to review images for bad classifications + How do I get my list of predictions match the order of the images in my test folder?

It’s not a copy&paste solution but I think these posts should provide you with all the ingredients. I guess the main point for you is that you the need the ds_idx instead of of just iterating of .items.

1 Like

(Zachary Mueller) #3

Go look at my new PR :slight_smile: Here is how I extrapolated:

        for j, idx in enumerate(self.tl_idx):
            da, cl = self.interp.data.dl(self.interp.ds_type).dataset[idx]

                img, lbl = self.interp.data.valid_ds[idx]
                fn = self.interp.data.valid_ds.x.items[idx]
                fn = re.search('([^/*]+)_\d+.*$', str(fn)).group(0)
                img.show(ax=ax[row, col])
                ax[row,col].set_title(fn)
                x += 1
2 Likes

(Mike) #4

Thank you, @haverstind and @muellerzr, I’ll check out both

0 Likes

(Mike) #5

Could you share a link to your PR? I couldn’t find it

0 Likes

(Mike) #6

@haverstind, I checked both of the linked posts, but it seems to me that there what you did was filtering a subset of the predictions so that’s why you needed to keep track of the indices. If no filtering is done in your loops, then i == ds_idx for all entries, and so it’s not really doing anything, I just tried to follow those snippets, but I am still getting inconsistent columns:

image

Could you please spell out what you had in mind, maybe I misunderstood?

0 Likes

(Zachary Mueller) #7

That’s the pr. The code I showed is how I got the filenames associated with when two classes matched. You should be able to extrapolate it from there. Source code link should be there too for my github as well.

0 Likes

(Zachary Mueller) #8

in the code, fn is file name. You can see where it lives in items.

0 Likes

(Mike) #9

I’m using 1.0.44 os I don’t have the class that you are using there. But surely it can’t be that hard zipping together preds and fnames? This such a frustrating thing it should be a no brainer… :confused:

0 Likes

(Zachary Mueller) #10

I show it with img and fname, give me a few minutes and I’ll show a better example

0 Likes

(Zachary Mueller) #11

This isn’t quite “zipping” as I don’t know how to do that in python yet, but the source code for that function is here: here

Look under _plot_imgs.

Though it may not be the best choice looking back as we assume losses. Is your data labeled? Or are we doing this on test sets

0 Likes

(Abhijeet Shettiwar) #12

Thanks @muellerzr.
This is what I was looking for.

0 Likes