ImageCleaner missing argument in lesson 2 download notebook

I was running the lesson 2 download notebook, and when I reached the lines using the ImageCleaner, I ran the statement ‘ImageCleaner(ds, idxs)’ and got the following error.


TypeError Traceback (most recent call last)
in
----> 1 ImageCleaner(ds, idxs)

TypeError: init() missing 1 required positional argument: ‘path’

It’s easy enough to fix as we defined path earlier, so I added path=path, but I didn’t know if this was something that needed to be fixed so thought I’d report it. Cheers to anyone who can help.

5 Likes

Same problem. But adding a third path argument does not help either. When I run this in Colab:

ImageCleaner(ds, idxs, Path(‘data/bears’)) # notebook lesson2-download

… colab just freezes, as if I asked it to do an infinite loop.

2 Likes

I got it to work:
ImageCleaner(ds, idxs, path)

using the path i defined earlier in the notebook… im doing a set with birds
folder = ‘amkestrel’
file = ‘urlsamkestrel.txt’
path = Path(‘Raptors’)
dest = path/folder

2 Likes

It seems that with path = Path(‘data/bears’) it should work just as well or at least throw some meaningful error. But it freezes for me… Are you also working in Colab, by the way?
So, if it’s the path, where is your ‘Raptors’ path relative to your dataset?

Dataset is images in multipe folders like… folder = “amkestrel”
where the structure is
Raptors/amkestrel
Raptors/redtail
Raptors/osprey

etc…
im not sure if im doing it right, but it worked up until i wanted to improve my results and use the clean up tool.
I was able to get to:
image

Where the most confused were some difficult images, errors, or baby birds.

NOTE maybe dont listen to me… it worked to get past cleaner step… but im having errors in the next step… so my workaround is temp… and not the right solution.

Still does not seem to work for me.
By the way, how much time does it take to create/load that ImageCleaner instance? (and how big is your dataset)

about 1000 images.
the first time it hung…
i stopped and started the Kernel again.
I was working on a cheaper gpu, then switched to a faster one.

Warning… the clean up can take time… i wasnt sure if i could exit and restart somewhere again…
same as with the next duplicate step…
took longer than i thought to actually click through all the batches. it would be nice to have an index of where you are in the toplosses… and be able to restart the clean-up if you have to leave and shut down the gpu…

1 Like

That’s what I also end up doing: restarting the kernel. Maybe I should wait for a bit longer, but so far I never managed to get the result.
How much time (at least the order of magnitude) did it take for you when it worked?

im using GPU+ from paperspace.
less than 5min, more than 1min

Well, so far it’s taking 16 minutes without any sign of progress. And I don’t even know what is it spending all its time on…
And Colab GPU apparently has 20GB of memory, although I’m not certain of that.

@AlexeyRB I am working in Paperspace Gradient trying a model on fruits and was able to recreate your problem. I ran “ImageCleaner(ds, idxs, Path(‘data/fruits’))” and it hangs for at least 5 minutes, and caused weird errors. I had the filesystem open in another window and when I tried to change folders got “server error: forbidden”. Also the top right corner of my notebook was displaying a red “Not Connected” message, a yellow “Forbidden”, and a white “Not Trusted”.

No idea what this means, but since we defined the path earlier, we can just use the line
“ImageCleaner(ds, idxs, path=path)” and it works fine for me. It must be something about the Path constructor you used. Let me know if that helps.

I would be surprised if it works, since path is defined exactly as Path(‘data/bears’). But I’ll try. Will take a few minutes to run the whole notebook again after restarting…

Apparently, ipywidgets like ImageCleaner do not work properly in Colab. I gathered this from searching around the forums for other problems I’m having with ImageCleaner.

For future reference, you may want to try searching the Platform: Colab thread for any issues you have.

3 Likes

@MadeUpMasters Unfortunately, changing:
ImageCleaner(ds, idxs, path=Path(‘data/bears’)) to

path = Path(‘data/bears’)
ImageCleaner(ds, idxs, path=path)

… does not seem to help…which makes total sense. At least it’s frozen for more than 7 minutes again.

@amqdn

That can be… but then no-one working with Colab should be able to run this line. I’ll check that tomorrow. Hope that’s not true: would be a pity to move to a different platform just because of these kinds of issues.

Thank you!

It’s true. No one using Colab seems to be able to run that line. If you go to that thread, open the search box, check “Search this topic,” and then search for “ImageCleaner”, you’ll see.

Hey, sorry for the false lead, that’s really frustrating. I hadn’t looked back at the declaration earlier in the notebook so I didn’t see that your way was already used earlier or it would’ve been obvious it wouldn’t work.

Very weird that it broke everything for me when I had no problems the first time. I rebooted the notebook, ran it again with the original code (path=path) and it worked fine. Paused for about 60 seconds before loading, 1000 images each for 4 classifications. Very strange. Please post here if you find a solution.

Facing the same issue with Colab

Same issue with colab as well. My workaround was to connect colab to google drive,

from google.colab import drive
drive.mount(’/content/gdrive’)

Then download all the images to drive, and manually clean up the dataset in drive view.

Using

ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Train)

I’m able to see the images by simple indexing ds.x[idxs[0]]. But in case of a large dataset comparing images visually will become cumbersome.
Please share how did you retrieve the absolute paths.

I just compared them visually in the drive, and discarded the “bad” ones. Maybe…

ds.to_df().iloc[idxs[:10]]

1 Like