Trying to Get Name of Images at which the Model Mispredicted

KareemAfifi · April 21, 2022, 2:12pm

I am working on a project to detect Covid-19 for chest x rays (Binary Classification) .
the Directory is Divided into (Train,Val ,Test ) and within each folder there is (Normal , Covid) .
and I am Trying to get the Name of the images at which the model mispredicted
Here is the Code

path=Path('/content/XRays')
np.random.seed(41)
data = ImageDataLoaders.from_folder(path,train="train", valid ="val",
        ds_tfms=aug_transforms(), size=256, bs=32, num_workers=4)

model = nn.Sequential(
    conv_block(3, 8),AND a COUple of other Layers
)
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)
learn.fit_one_cycle(100, lr_max=3e-3)


testingdata = ImageDataLoaders.from_folder(path,train='test',
        ds_tfms=aug_transforms(), size=256, bs=32, num_workers=4)
//Getting Predictions

preds,y = learn.get_preds(dl=testingdata.train)
//Getting Statistics
probs=preds
targets=y
probs = np.argmax(probs, axis=1)
correct = 0
for idx, pred in enumerate(probs):
    if pred == targets[idx]:
        correct += 1
    
accuracy = correct / len(probs)
print(len(probs), correct, accuracy)

KevinB · April 21, 2022, 10:15pm

check out Interpretation of Predictions | fastai

You can use that to get an interp object and then do something like interp.top_losses(9, items=True) to get the worst performing outputs and the items associated with them

KareemAfifi · April 22, 2022, 12:50pm

Thank you for your reply , I have tried the Interpretation Class , but didn’t yield out the results i expected . From the Confusion Matrix i ran , there are 23 mispredictions out of 600 (Testing Data) .
Here is the Code i ran from Interpretation Class

The Interp.top_loss seems to only result in the Biggest Difference between the Categories . Would really appreciate your help if you can help me with this
Regards

KevinB · April 22, 2022, 2:06pm

Can you explain a little more what you are wanting? My thought was that you could grab the top 23 items from the interp.top_losses like this:

loss_val, loss_idx, items = interp.top_losses(23, items=True)

That should tell you which items they are. Another option you could look into is using testingdata.items which I think should line up with your preds, y values (not super confident on this so you should definitely test it out if you go this route)

KareemAfifi · April 22, 2022, 2:35pm

Hi Kevin ,
First of all thanks for helping me . I want to get the Name of the Images (Path if possible ) so that i can see why these images mispredicted (Maybe the Resolution was not good etc…) . Will try this approach and inform you

KareemAfifi · April 22, 2022, 5:56pm

I think that the "testingdata.items " approach is easier as the top_losses seems not to be the best solution .so if possible solving the other question will solve this issue as well