I built an image classification model using cnn learner, and check the most confused images using ImageClassification interpreter, and those images are very difficult to identify by humans also, is there a way I can remove those images from my dataset and retrain? When I plot most confused images I see just images but not their names, and my dataset is 5000+ images, it’s difficult to manually find those in the dataset.
If those confusable images are typical in reality, and they are in fact correctly labelled, then it is important to leave them in the dataset. Although the model may not have predicted them correctly, their existence in the training data may have assisted the model to guess other images correctly. Take all the confused images out and retrain and you may find a new set of confused images. Try it and see if it is true for your dataset. Sometimes it can even be more useful to remove the least confused images!
Moreover, even if such images are incorrectly labelled, it can sometimes still be worth keeping them in. e.g. in a competition/challenge the goal is not to be correct, but to guess what the labellers labelled. It’s common for the labellers to be consistently wrong.
If you take a look at the plot most confused code you’ll see how to approach getting the label confidence and name of each image.