Applying ClassificationInterpretation methods on the training set

Hey guys,

What is the best way to apply the ClassificationInterpretation methods on the training set. For example to plot the confusion matrix, or get the top/bottom losses on the training set. This could be useful for comparing training and validation set results for signs of overfitting, but the main reason I want to use it is to apply the FileDeleter class from lesson 2 to clean up the training set as well. To do that, it seems we need a way to apply the top_losses() method to the training set.

(This is related to my question in the lesson 2 discussion: https://forums.fast.ai/t/lesson-2-in-class-discussion/28632/340?u=bartp88. It got buried, so I hope it’s okay that I made a new topic out of my question)

3 Likes

Your best bet is probably to run what is called “cross validation”, in essence you should cycle through all of your data giving each image a chance to be in the validation set. eg if your validation set is 20%, you cut your data into 5 pieces ABCDE. Then you train on ABCD with E as validation, then ABCE with D as validation and so forth. In fastai v1 AFAIK you’d have to read up on creating your own datasets to get this working.

1 Like

That’s only necessary if you have a very small dataset. Generally a standard validation set should be fine.

@zachcaceres is working on that today, so hopefully we’ll have a solution for you soon!

2 Likes

You can find a workaround for this here