Filtering the classes within the dataloader object

asifdegr8 · March 21, 2023, 3:32pm

My usecase is to load image data from file system, in which images are placed in subfolders, where subfolder name represents class name.
Their are total 9 classes (9 subfolders), but I am only interested in 3 classes. I filtered the classes in dataloader object, but when I display confusion matrix, it displays all matrix of all 9 classes.

Here is my code to create dataloader object.

classes = [‘Earwax’,‘Aom’,‘Normal’]
dls = ImageDataLoaders.from_folder(path, valid_pct=0.2, bs=64, item_tfms=Resize(224), classes=classes)

When I use the statement
dls.show_batch(figsize=(10,8))
This displays image of three classes only. Fine.

on printing number of classes
print(dls.c)
This return 9, but I am expecting 3.

Creating confusion matrix
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(20,20), dpi=60)

This creates confusion matrix of all 9 classes. (however I am expecting only 3 classes). Is the model trained on all 9 classes or 3 classes only?

lucasvw · March 21, 2023, 3:45pm

Apparently the dataloader is still looking at all folders, although you give it the explicit class names. It’s probably the easiest to just remove the subfolders of the classes that you are not interested in. Or in case you don’t want to remove the data: create a new folder data_folder, inside this folder create symlinks to the three class folders you are interested in, and then create your dataloaders from this data_folder