Google images corrupted after download

Hello everyone,

I am currently working through Lesson 2, attempting to download images from google.

I successfully dowloaded the urls I needed from google images using the following code:

urls = Array.from(document.querySelectorAll(’.rg_di .rg_meta’)).map(el=>JSON.parse(el.textContent).ou);
window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(’\n’)));

The problem I am having is when I used ‘download_images(path/file, dest, max_pics=20, max_workers=0)’ the images appear to download to the path. However, when I try to view them there is often no image to be displayed.

This is the case, even when I have verified that there is an image using the url in a separate window. Why is the image not displaying in the path. Is there something I am doing wrong?

1 Like

Did you run the part to clean up corrupted images?

Also could you test to transfer/download one of the invisible images to your desktop, does it view/open properly there?

Restart Kernel helped me sometime, for reason I don’t know.

I can run the clean up code but that would be highly inefficient because all the files seem to be corrupted. For some reason, ‘download_images’ is corrupting the images.

I did a quick check and took each url manually, I was able to download the images without a problem. Given that my dataset has several classes it is not ideal to do this manually for each url for each class.

Also, I restarted my Kernel, but that did not seem to help either.

Just to understand your problem:

However, when I try to view them there is often no image to be displayed.

all the files seem to be corrupted

So no image is working, or some display?

I did a quick check and took each url manually, I was able to download the images without a problem.

Yes URLs are good, but can you confirm that the image file are actually broken? You might want to download one from your instance to your host machine/laptop and see if it opens in a regular image viewer?

Did you change any of the code in the exercise? If so can you post code.

What environment are you using?

When I did this exercise I was confused about the going back and forward / up and down to reset the path / file for each of the Bears. I ended up rewriting that to a loop to avoid mistakes. Something like:

for folder in ['black','teddy','grizzly']:
    dest = path/folder
    dest.mkdir(parents=True, exist_ok=True)
    file= f"urls_{folder}.csv"
    download_images(path/file, dest, max_pics=200)
    

Hi @bmrobins,

Kindly modify the code you need to add extension .jpg

urls = Array.from(document.querySelectorAll(’.rg_di .rg_meta’)).map(el=>JSON.parse(el.textContent).ou);
window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(’.jpg\n’)));

added .jpg before \n.

I had similar issue after downloading all the images so modified it so that images have .jpg extension.

May be @jeremy can modify this code in notebook. If required .