Find path to file (ImageDataBunch)

I am trying to follow https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson1-pets.ipynb by using my data.

path = Path('some path to data in image-net style folder format') 
np.random.seed(42)
data = ImageDataBunch.from_folder(path, size=224, num_workers=4)

After training my model, I created confusion matrix and now want to find images that were classified incorrectly

interp = ClassificationInterpretation.from_learner(learn)
losses, idxs = interp.top_losses(50)

data.valid_ds[idxs[0]] - I`ll have incorrect image and can even plot it, but my dataset has couple thousand images and I do not want visually check all of them to find incorrectly classified.

My dataset is not very clean and I want to delete some misclassified examples that are in real were put to wrong directory

If you go onto lesson 2, Jeremy mentions FileDeleter, which does exactly that :slight_smile: If you use google Colab, I’m working on a few plugins and porting FileDeleter to there, but for now there are other ways to get around it

1 Like

@muellerzr already mentioned FileDeleter which worked great for me. Just in case you need to get the file name anyway (e.g. to automate stuff outside of a Jupyter notebook), have a look at How do I get my list of predictions match the order of the images in my test folder?

2 Likes

@haverstind thank you for this!!! I’ve been struggling to find that for days now!!! I can’t appreciate that enough.

Glad I could help.

I found this by reading the source code.

  • Basically my idea was that the DataBunch/ ImageList would not load everything to memory at startup so the information had to be somewhere.
  • Next I checked where open_image() was called in the fastai library.
  • ImageList.open() was the interesting place.
  • that method is called by ImageList.get().
  • .get() does fn = super().get(i)
  • super() refers to fastai.data_block.ItemList
  • ItemList.get() just does return self.items[i]
2 Likes