Mapping top losses back to original files (CNN)

julclu · April 2, 2019, 5:39pm

This may be a silly or obvious question, but I’m wondering if there is a way to map images with the greatest loss in classification back to their original file names. I have searched all around the forums and can’t seem to find a way to do this. I am mostly referring to when we say interp.plot_top_losses().

The only way I can think of is to pass each one of my validation images through the CNN again one by one, observe the loss, and record it. I am curious because all of my files in my dataset were manually labeled, and I believe there could be some labeling errors that I would like to check, starting with the top losses.

Please let me know if this is a solved problem!

shawn · April 2, 2019, 8:19pm

Hi. Try calling ClassificationInterpretation.top_losses() instead. This will return the losses and the indices of the images in your data set, in descending order of loss.

To get the filenames, you can index into the .valid_ds property of your DataBunch.

Update:

My response was almost right, but not quite. Indexing directly into the valid_ds property will yield a tuple of type (fastai.vision.image.Image, fastai.core.Category). This does not give you the filenames. To get the filenames, index into .valid_ds.items instead.

julclu · April 2, 2019, 9:05pm

@shawn thanks!

soundly_typed · September 22, 2020, 10:19pm

For slightly fuller example, here’s what I did to output filenames along with the visual from plot_top_losses (variable names used from course lesson 02):

x = 1
for i in interp.top_losses(12).indices:
    print(f"[{x}] {dls.valid_ds.items[i]}")
    x += 1
interp.plot_top_losses(12, nrows=3)