Bulk Images Not Download - duckduckgo - part 0

blluecricket · July 28, 2023, 9:45am

I can download individual images, but when I run this it stalls. Been running for at least 5 minutes and nothing. I’ve also tried restarting the machine. How do I get it working and is there a way to see its progress? I’m running this on a Paperspace Gradient Pytorch

searches = 'forest','bird'
path = Path('bird_or_not')
from time import sleep

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    sleep(10)  # Pause between searches to avoid over-loading server
    download_images(dest, urls=search_images(f'{o} sun photo'))
    sleep(10)
    download_images(dest, urls=search_images(f'{o} shade photo'))
    sleep(10)
    resize_images(path/o, max_size=400, dest=path/o)

UmerAdil · July 28, 2023, 2:25pm

Hey!

Unfortunately, there’s no “easy” way to get progress bars in this case. Normally, you create progress bars at the point where the iteration happens, i.e., in your case, where each url is downloaded. But this happens inside the download_images function.

So, to get progress bars, you would need to change download_images a tiny bit. Luckily, this is not that complicated!

You only need to import fastprogress, and wrap the iterator in download_images inside a progressbar.

Here’s a full working example. If you don’t understand everything, feel free to just use the code. That’s totally fine when starting out!

Edit: Updated link after making notebook public

DanielW · July 28, 2023, 6:38pm

I would create a counter variable and inside the for I would insert a

print(counter)
counter = counter + 1

In this way you know it is still working

blluecricket · July 30, 2023, 3:02pm

It turned out to just be very slow. When I waited about 10 minutes it worked. 5 was too short.

blluecricket · July 30, 2023, 3:03pm

That link gave a 404

UmerAdil · July 31, 2023, 10:14am

Oops, didn’t know Kaggle notebooks were private by default. Fixed now!