Doing the Cats Vs. Dogs lesson right now and looking at the images which the model had the hardest time classifying. Some of those examples are neither dog nor cat, and obviously impossible to classify.
What is best practice in situations like this? Let’s say this was a Kaggle competition. What should one do?
Should samples like that be left in the data or cleaned out?
I am thinking that samples like that cleaning them out would be better, but I am not sure.
My thinking goes like this:
- These bad samples do not contribute to training my model (because their class is undefined)
- Even if bad samples like this is then presented to the trained model when using it for prediction, as the outcome is undefined anyway, it will not matter if my model has seen such examples or not during training.
Am I doing some logical mistake here?