Lesson 2 In-Class Discussion ✅

bartp88 · October 31, 2018, 12:56pm

In the part about cleaning up your dataset, Jeremy goes through this code:

losses,idxs = interp.top_losses()  # returns the top losses in the 
top_loss_paths = data.valid_ds.x[idxs]

Now we can pass in these paths to our widget.

fd = FileDeleter(file_paths=top_loss_paths)

And then he mentions that we should run the same thing, but replace valid_ds by train_ds to also cleanup the training set. It seems to me that this isn’t correct. The indici returned by interp.top_losses() are indici into the validation set, so if we only replace valid_ds by train_ds we will select images from our training set based based on the losses of validation images. It seems we should first modify something so that the interp instance is applied to the trainingset instead of the validation set (not sure how though?). Or am I misunderstanding something?