Duplicate Widget

yifan · February 24, 2019, 8:38pm

Hey my question is in Cleaned.csv everything is included rather than images of each category in each separate file. In this case, how can we recreate ImageDataBunch for this cleaned.csv file?

sheepish · February 25, 2019, 8:37am

Kaggle now seems to be using latest version and so from_similars works fine. Thanks!

sheepish · February 25, 2019, 9:33am

You can create the ImageDataBunch from the csv, like so (assuming your csv is in folder named ‘data’):
data = ImageDataBunch.from_csv('data',csv_labels='cleaned.csv',label_col=1,valid_pct=0.2,ds_tfms=get_transforms(), size=224, num_workers=0).normalize(imagenet_stats)

realdiganta · May 19, 2019, 6:38am

When I run the code ImageCleaner(ds, idxs, path) in Google Colab, it just keeps on running and never ends, I don’t see the images with the option to delete either. Is it only me, or is the problem that I am running it on Colab ?

muellerzr · May 19, 2019, 5:00pm

Google colab will not run widgets so you cannot use image cleaner within it.

manveet_dn · May 19, 2019, 5:30pm

I have implemented my code like this in Google colab, to clean my images>Actually my error rate is only 1.7% only, but actually i am new to data cleaning and want see how to implement that.
db = (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.transform(get_transforms(), size=224)
.databunch()
)
learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)
learn_cln.load(‘stage-2’);
ds, idxs = DatasetFormatter().from_toplosses(learn_cln)
After executing this all snippets perfectly. When I am try to execute the below statement there Its taking so much time and also its not responding to any interrupts. Can anyone tell me is the below statement corrcet or am wrong anywhere which is leading to that.??? Do help me.
ImageCleaner(ds, idxs, path),
@muellerzr You posted like Colab will not implement widgets so we cant do image cleaning in that Is that correct.??

muellerzr · May 19, 2019, 5:31pm

Yes, it’s a security thing with google. They will not allow external widgets to be used.

manveet_dn · May 19, 2019, 5:33pm

Ok thanks @muellerzr , Actually i dont have any gpu actually where can i do that enabling onlne free source gpu can suggest me any good open source like colab which allows widgets.

muellerzr · May 19, 2019, 5:35pm

I do not. I don’t usually clean my dataset post-training unless it was downloaded automatically, as having ‘too clean’ of data can be bad in the sense of relying on the models wrong, deleting them, then trying again you won’t actually improve much if your deleted images were actually good images. Does that make sense? In regards to the GPU, no. I do believe you can find free credit codes for PaperSpace here on the forum if you do enough digging though.

manveet_dn · May 19, 2019, 6:06pm

Ok thanks @muellerzr, Very glad for your reply.

realdiganta · May 20, 2019, 4:34pm

Thanks for the solution.

andrew77 · June 19, 2019, 3:42pm

@muellerzr

I just started the course last week and now the material has been updated to this and the video actually de-sync to this new widget. Since I’m using Google Colab I’m not allowed to run right.

What should I do? Skip this section to deployment? Will I be missing a lot of things?

Thanks

muellerzr · June 19, 2019, 3:44pm

Just skip the section on the widget in steps. But listen to why. The concepts of data cleaning is important. It’s a tool to do it but there are other ways and it’s important to know why.

andrew77 · June 19, 2019, 3:48pm

@muellerzr WOW the reply is fast.

So in real world… we should actually clean after you download right?

muellerzr · June 19, 2019, 3:53pm

I check my emails often In the real world yes, you have two options. Clean before download (preferable and almost always the best) but if it’s too much to go through, you can quickly use paperspace or another provider, even your own local machine if you can, and run the imagecleaner to analyze if your top losses were real photos or not, and delete them that way. ImageCleaner helps make that process less, well, awful. But it’s certainly not the only way.

andrew77 · June 19, 2019, 3:55pm

@muellerzr

Oh I got you…meaning I use jupyter to run it since Colab can’t do so
and upload accordingly later…

That makes sense.

muellerzr · June 19, 2019, 3:56pm

Exactly! No problem, glad I could help.

karthika · August 20, 2019, 5:00pm

I am using widget to clean the images, i have all data as training set (4107 items). When i ran DatasetFormatter().from_similars(learn) I still end up with a list of indexes that’s twice as l the dataset and just matching each image with itself:

ds, idxs = DatasetFormatter().from_similars(learn)
len(idxs_ giving me 8214

but ds is giving me 4107 items only. Is the indexes doubled.

karthika · August 20, 2019, 5:17pm

I am using widget to clean the images, i have all data as training set (4107 items). When i ran DatasetFormatter().from_similars(learn) I still end up with a list of indexes that’s twice as l the dataset and just matching each image with itself:

ds, idxs = DatasetFormatter().from_similars(learn)
len(idxs) is giving me 8214 items in ist
but dataset (ds) is giving me 4107 items only.

Is the indexes doubled. can anyone please clarify? when i run the ImageCleaner, it keeps showing duplicate images, so wondering whether the images are getting compared with itself

lesscomfortable · September 2, 2019, 6:02pm

Hey, what version are you using? Are you using it like indicated in the docs?