When using the Dataloader API, I’m often encountering badly labeled data (when using the very convenient show_batch(), or show_results(), or plot_top_losses() ). When I see this, I want to know the source image file with the labelling problems, as it may indicate that more image types in its vicinity are corrupted.
I tried to modify the specific functions to do it, but got stuck as I’m not sure the original file’s data even exist. Anybody has an idea how I would do that? How can I keep track of the original files that created the samples in the dataloader?
I found these answers from previous versions:
Using the answer above I managed to relate the filename by taking the idxs of the top losses:
losses,idxs = interp.top_losses(10)
and getting this list
[ 493, 209, 711, 93, 862, 1082, 226, 708, 864, 111]
the interesting (badly labeled) image is the last one, so I use:
and get the filename.
The weird part is that when I inspect the image under the given filename that I obtained, I don’t see the same image as the last image in
interp.plot_top_losses(10)! Actually, none of the images in the top losses plot correspond to the images obtained in this method.
My mistake! I used a different dataloader object to get the file names. So all is good and the method above solved my problem. Maybe I should leave this post here in case someone else needs to relate the file names to the results?