Download_images hangs, ignoring timeout, when url is stale

drscotthawley · June 17, 2021, 2:21pm

(previous threads about download_images are old and closed e.g. this one, so I’m starting a new one).

fastai.vision.utils.download_images() hangs [on Colab and locally] on the “teddy bear” URLs I get from DuckDuckGo (but not the “grizzly bear” or “black bear” URLs), when I try download from a list longer than roughly 200 URLs.

Even with a timeout of 4 seconds, at its worst, trying to download 300 images should take at most 1200 seconds, i.e. 20 minutes, right? Yet for me it never completes. And changing the timeout= kwarg to download_images() has no effect.

Can anyone help me resolve this?

Here is a Minimum (Not) Working Example in a Colab notebook:

(I’m finding the “layered API” daunting because I really want to just insert print statements throughout the download_images code to see what it’s doing, but it’s a routine that calls another routine that calls another routine…)

UPDATE: There is one problematic URL, https://magpies-gifts.co.uk/media/catalog/product/cache/2/image/1200x1200/9df78eab33525d08d6e5fb8d27136e95/0/3/039096.jpg, whereby DuckDuckGo returns it as an image URL but if you navigate there you get a PHP page saying “Magpies Gifts is now closed”. This is where download_images hangs.

How could we make it robust to this kind of problem, e.g. to “really” timeout?

Thinam · June 19, 2021, 3:54am

I have the similar problem like yours. I just fixed it from another discussion.
Please go through this discussion : Lesson 2 - Official Topic - #498 by itsuken
I hope it will solve.