(previous threads about download_images
are old and closed e.g. this one, so I’m starting a new one).
fastai.vision.utils.download_images()
hangs [on Colab and locally] on the “teddy bear” URLs I get from DuckDuckGo (but not the “grizzly bear” or “black bear” URLs), when I try download from a list longer than roughly 200 URLs.
Even with a timeout of 4 seconds, at its worst, trying to download 300 images should take at most 1200 seconds, i.e. 20 minutes, right? Yet for me it never completes. And changing the timeout=
kwarg to download_images()
has no effect.
Can anyone help me resolve this?
Here is a Minimum (Not) Working Example in a Colab notebook:
(I’m finding the “layered API” daunting because I really want to just insert print statements throughout the download_images
code to see what it’s doing, but it’s a routine that calls another routine that calls another routine…)
UPDATE: There is one problematic URL, https://magpies-gifts.co.uk/media/catalog/product/cache/2/image/1200x1200/9df78eab33525d08d6e5fb8d27136e95/0/3/039096.jpg, whereby DuckDuckGo returns it as an image URL but if you navigate there you get a PHP page saying “Magpies Gifts is now closed”. This is where download_images hangs.
How could we make it robust to this kind of problem, e.g. to “really” timeout?