ImageCleaner Variation

Hi,

tl;dr Using “ds, idxs = DatasetFormatter().from_toplosses(learn)” how can I remove top x% of the indexes returned automatically?

Background

I have a custom dataset which I know isn’t “clean”. I noticed my model is peaking at 87% accuracy which I am pretty pleased with but want to further improve.

In order to further clean the data and get better accuracy I want to use the ImageCleaner widget however, I wasn’t able to use it in colab env which I using as my training env.

Question

Since I’m not able to manually go through the data in the order of top loss, I was wondering if it is possible to automatically remove the top x% of idxs returned from:

ds, idxs = DatasetFormatter().from_toplosses(learn)

and output ‘cleaned.csv’ accordingly?

1 Like