I am working on a project to detect Covid-19 for chest x rays (Binary Classification) .
the Directory is Divided into (Train,Val ,Test ) and within each folder there is (Normal , Covid) .
and I am Trying to get the Name of the images at which the model mispredicted
Here is the Code
path=Path('/content/XRays')
np.random.seed(41)
data = ImageDataLoaders.from_folder(path,train="train", valid ="val",
ds_tfms=aug_transforms(), size=256, bs=32, num_workers=4)
model = nn.Sequential(
conv_block(3, 8),AND a COUple of other Layers
)
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)
learn.fit_one_cycle(100, lr_max=3e-3)
testingdata = ImageDataLoaders.from_folder(path,train='test',
ds_tfms=aug_transforms(), size=256, bs=32, num_workers=4)
//Getting Predictions
preds,y = learn.get_preds(dl=testingdata.train)
//Getting Statistics
probs=preds
targets=y
probs = np.argmax(probs, axis=1)
correct = 0
for idx, pred in enumerate(probs):
if pred == targets[idx]:
correct += 1
accuracy = correct / len(probs)
print(len(probs), correct, accuracy)
You can use that to get an interp object and then do something like interp.top_losses(9, items=True) to get the worst performing outputs and the items associated with them
Thank you for your reply , I have tried the Interpretation Class , but didn’t yield out the results i expected . From the Confusion Matrix i ran , there are 23 mispredictions out of 600 (Testing Data) .
Here is the Code i ran from Interpretation Class
The Interp.top_loss seems to only result in the Biggest Difference between the Categories . Would really appreciate your help if you can help me with this
Regards
That should tell you which items they are. Another option you could look into is using testingdata.items which I think should line up with your preds, y values (not super confident on this so you should definitely test it out if you go this route)
Hi Kevin ,
First of all thanks for helping me . I want to get the Name of the Images (Path if possible ) so that i can see why these images mispredicted (Maybe the Resolution was not good etc…) . Will try this approach and inform you
I think that the "testingdata.items " approach is easier as the top_losses seems not to be the best solution .so if possible solving the other question will solve this issue as well