In the part about cleaning up your dataset, Jeremy goes through this code:
losses,idxs = interp.top_losses() # returns the top losses in the
top_loss_paths = data.valid_ds.x[idxs]
Now we can pass in these paths to our widget.
fd = FileDeleter(file_paths=top_loss_paths)
And then he mentions that we should run the same thing, but replace valid_ds by train_ds to also cleanup the training set. It seems to me that this isn’t correct. The indici returned by interp.top_losses() are indici into the validation set, so if we only replace valid_ds by train_ds we will select images from our training set based based on the losses of validation images. It seems we should first modify something so that the interp instance is applied to the trainingset instead of the validation set (not sure how though?). Or am I misunderstanding something?
was same problem,
fastai version : 1.0.14
torch version : 1.0.0.dev20181028
after upgrade everything OK.
now vers:
fastai version : 1.0.18
torch version : 1.0.0.dev20181029
Hoy, merci! Ah, yes. I forgot to mention that I am on Safari. You said adblocker but in my preferences I only see settings for Content Blockers, Notifications, and Pop-up Windows.
Does anybody know where filenames are stored or how to modify data.show_batch() to display the file name as the image title?
I tried using the widget for deleting bad images that was demo’d in class but it didn’t really work in my jupyter lab environment. In addition, I wanted to get a random batch of images rather than the most incorrect because the model isn’t good enough yet for most incorrect to be that useful (35% error rate). Given this data.show_batch() already seems suited to the task. I just can’t seem to find where the file name is stored in the dataloader or dataset
Can we get some tips on how to deal with less number of samples while training an image classifier or may be a model in general ?Also can Jeremy repeat some details about rectangular images vs square images in the training data ?