This is incorrect; According to the “plot top losses” this should be an “other” and not a “bee”. I have verified by hand that the offending image of “other” is indeed there in the source data – but I need a correct index to it to help in cleaning it up.
I guess that the observations are shuffled during training, which is OK, but how do I get the original index of the observation?
I need that to clean up the source dataset - it indeed contains some mislabelled data.
@georey
Hey! First off, you are doing a classification problem. You’re using the interpretation class. You should be using the ClassificationInterpretation class.
Anyways! Once you use that class, you’ll probably use the plot_confusion_matrix method to visualize what your model isn’t getting right.
Do tell me if I misunderstood anything.
Hope this helps. Cheers!
Ok, I got it. when calling interp = ClassificationInterpretation.from_learner(learn), the Interpretation() class initializer stores it’s own copy of the dataset in the *.dl property. So I can now say:
losses,idxs = interp.top_losses()
interp.dl.items
and that gives me shuffled DataFrame that has been used to compute the losses during start-up of the ClassificationInterpretation
so that I can take the 5 worst observations this way:
Yup, I did that, but the question is how to get the actual index and file name and the path of the item that I spot that is mislabelled so that I could make a nice-to-use relabelling tool.
Anyways, I went through the fast.ai source code and I figured it out.