ImageCleaner missing argument in lesson 2 download notebook

That’s what I also end up doing: restarting the kernel. Maybe I should wait for a bit longer, but so far I never managed to get the result.
How much time (at least the order of magnitude) did it take for you when it worked?

im using GPU+ from paperspace.
less than 5min, more than 1min

Well, so far it’s taking 16 minutes without any sign of progress. And I don’t even know what is it spending all its time on…
And Colab GPU apparently has 20GB of memory, although I’m not certain of that.

@AlexeyRB I am working in Paperspace Gradient trying a model on fruits and was able to recreate your problem. I ran “ImageCleaner(ds, idxs, Path(‘data/fruits’))” and it hangs for at least 5 minutes, and caused weird errors. I had the filesystem open in another window and when I tried to change folders got “server error: forbidden”. Also the top right corner of my notebook was displaying a red “Not Connected” message, a yellow “Forbidden”, and a white “Not Trusted”.

No idea what this means, but since we defined the path earlier, we can just use the line
“ImageCleaner(ds, idxs, path=path)” and it works fine for me. It must be something about the Path constructor you used. Let me know if that helps.

I would be surprised if it works, since path is defined exactly as Path(‘data/bears’). But I’ll try. Will take a few minutes to run the whole notebook again after restarting…

Apparently, ipywidgets like ImageCleaner do not work properly in Colab. I gathered this from searching around the forums for other problems I’m having with ImageCleaner.

For future reference, you may want to try searching the Platform: Colab thread for any issues you have.

3 Likes

@MadeUpMasters Unfortunately, changing:
ImageCleaner(ds, idxs, path=Path(‘data/bears’)) to

path = Path(‘data/bears’)
ImageCleaner(ds, idxs, path=path)

… does not seem to help…which makes total sense. At least it’s frozen for more than 7 minutes again.

@amqdn

That can be… but then no-one working with Colab should be able to run this line. I’ll check that tomorrow. Hope that’s not true: would be a pity to move to a different platform just because of these kinds of issues.

Thank you!

It’s true. No one using Colab seems to be able to run that line. If you go to that thread, open the search box, check “Search this topic,” and then search for “ImageCleaner”, you’ll see.

Hey, sorry for the false lead, that’s really frustrating. I hadn’t looked back at the declaration earlier in the notebook so I didn’t see that your way was already used earlier or it would’ve been obvious it wouldn’t work.

Very weird that it broke everything for me when I had no problems the first time. I rebooted the notebook, ran it again with the original code (path=path) and it worked fine. Paused for about 60 seconds before loading, 1000 images each for 4 classifications. Very strange. Please post here if you find a solution.

Facing the same issue with Colab

Same issue with colab as well. My workaround was to connect colab to google drive,

from google.colab import drive
drive.mount(’/content/gdrive’)

Then download all the images to drive, and manually clean up the dataset in drive view.

Using

ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Train)

I’m able to see the images by simple indexing ds.x[idxs[0]]. But in case of a large dataset comparing images visually will become cumbersome.
Please share how did you retrieve the absolute paths.

I just compared them visually in the drive, and discarded the “bad” ones. Maybe…

ds.to_df().iloc[idxs[:10]]

1 Like

Yeah, I don’t want to do the visual comparison; just get the absolute paths.

This is kind of an annoying work-around if you have a low performance laptop running linux.

  1. Install anaconda3, then install fastai. This should be enough to run the jupyter notebook locally.
  2. Use amatic’s suggestion of mounting google drive (then set the path to ‘gdrive/fastai_data’ or something similar):
  1. Run the notebook in colab up to the point where you want to run the cleaner.
  2. Download the data and the notebook from google drive to your local machine.
  3. Run the notebook locally until you have created the “learn” object, but not done any learning (this is the part you don’t want to run locally on a low performance machine).
  4. Then skip to learn.load(‘stage-2’). This should work just fine since it should have been downloaded with the data.
  5. Run the cleaner locally.
  6. Upload “cleaned.csv” to google drive.
  7. Reload the data set using “cleaned.csv” and continue working.
12 Likes

I’m trying to run Image Cleanup on folder contains [‘train’, ‘valid’]
With 2 categories in each : ‘mgg’, ‘met’ ) rather than csv file
Do you need to run the cleanup for each folder separately?

I tried several configurations without success. for e.g. running:

pathC=‘data/hggmetN2/train/hgg/’
ImageCleaner(ds, idxs, pathC)

Crashing on

I’d appreciate your help!
Moran

I took care of this by setting a variable with the path.

path = ds.x.path

and then passing that to ImageCleaner:

ImageCleaner(ds, idxs, path)

You can see whats in the dataset variable by printing it if you want to display your path:

print(ds)

1 Like

Thanks for your reply
I’d tried the following:
ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)
path = ds.x.path
ImageCleaner(ds, idxs, path)

with path = the main folder, contains the valid and train folders
print(ds)

The cleaned.csv file was created at my path folder
But the ImageCleaner GUI does not popup, And I get the following output

What did I do wrong?
Thanks a lot
Moran

I’m not sure. I see that in the notebook in the github repo someone fixed the path parameter missing in the ImageCleaner call. Maybe try pulling and getting the updated notebook?

Thanks!