Bulk Images Not Download - duckduckgo - part 0

I can download individual images, but when I run this it stalls. Been running for at least 5 minutes and nothing. I’ve also tried restarting the machine. How do I get it working and is there a way to see its progress? I’m running this on a Paperspace Gradient Pytorch

searches = 'forest','bird'
path = Path('bird_or_not')
from time import sleep

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    sleep(10)  # Pause between searches to avoid over-loading server
    download_images(dest, urls=search_images(f'{o} sun photo'))
    sleep(10)
    download_images(dest, urls=search_images(f'{o} shade photo'))
    sleep(10)
    resize_images(path/o, max_size=400, dest=path/o)

Hey!

Unfortunately, there’s no “easy” way to get progress bars in this case. Normally, you create progress bars at the point where the iteration happens, i.e., in your case, where each url is downloaded. But this happens inside the download_images function.

So, to get progress bars, you would need to change download_images a tiny bit. Luckily, this is not that complicated!

You only need to import fastprogress, and wrap the iterator in download_images inside a progressbar.

Here’s a full working example. If you don’t understand everything, feel free to just use the code. That’s totally fine when starting out!

Edit: Updated link after making notebook public

1 Like

I would create a counter variable and inside the for I would insert a

print(counter)
counter = counter + 1

In this way you know it is still working :slight_smile:

1 Like

It turned out to just be very slow. When I waited about 10 minutes it worked. 5 was too short.

That link gave a 404

Oops, didn’t know Kaggle notebooks were private by default. Fixed now!