In the part about cleaning up your dataset, Jeremy goes through this code:
losses,idxs = interp.top_losses() # returns the top losses in the
top_loss_paths = data.valid_ds.x[idxs]
Now we can pass in these paths to our widget.
fd = FileDeleter(file_paths=top_loss_paths)
And then he mentions that we should run the same thing, but replace valid_ds
by train_ds
to also cleanup the training set. It seems to me that this isn’t correct. The indici returned by interp.top_losses()
are indici into the validation set, so if we only replace valid_ds
by train_ds
we will select images from our training set based based on the losses of validation images. It seems we should first modify something so that the interp
instance is applied to the trainingset instead of the validation set (not sure how though?). Or am I misunderstanding something?