Lesson 2 - Creating your own dataset from Google Images

JesperR · April 23, 2019, 2:40pm

This question is a rewrite of a question posted yesterday. It is rewritten in hope that someone can help, so that I move on to the next lesson.

I try to run the Creating your own dataset from Google Images notebook, but I keep getting errors.

It is no problem getting the images and creating the folders for each bear.

When I run path.ls() I get the following output:

[PosixPath(‘data/bears/teddys’),

PosixPath(‘data/bears/black’),

PosixPath(‘data/bears/grizzly’)]

I am supposed to run:

download_images(path/file, dest, max_pics=200)

but that returns:

FileNotFoundError: [Errno 2] No such file or directory: ‘data/bears/urls_grizzly.csv’

Why is that?

According to the lecture, download_images(path/file, dest, max_pics=200) is used to download

images to, in my case Google Cloud. But when I downloaded the images using

the JavaScript, the images was saved to my Mac. How do I get the images from my Mac to

Google cloud using download_images(path/file, dest, max_pics=200)?

Thank you.

farhad · April 24, 2019, 4:05pm

Hi Jesper,
You need to actually upload the csv files onto your Google Cloud Instance. download_images is looking for those files and they don’t exist on the machine.

The easiest way to do this is just use the Jupyter notebook UI. Go do your data/bears/ directory and hit the upload button to upload the csv files with the image urls.

This is mentioned with an image in the lesson2-download.ipynb

If you’ve done it correctly, when you run path.ls() you should see it also list the csv files.

Hope that helps.

JesperR · May 8, 2019, 6:12pm

Thank you for replying to my question.

Everything works fine until I run

download_images(path/file, dest, max_pics=200)

When I do, I get the following error

FileNotFoundError Traceback (most recent call last)

<ipython-input-12-e85756baeaa4> in <module>

----> 1 download_images(path/file, dest, max_pics=200)

/opt/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py in download_images(urls, dest, max_pics, max_workers, timeout)

191 def download_images(urls:Collection[str], dest:PathOrStr, max_pics:int=1000, max_workers:int=8, timeout=4):

192 "Download images listed in text file urls to path dest, at most max_pics"

–> 193 urls = open(urls).read().strip().split("\n")[:max_pics]

194 dest = Path(dest)

195 dest.mkdir(exist_ok=True)

FileNotFoundError: [Errno 2] No such file or directory: ‘data/bears/urls_grizzly.csv’

…

I also tried running

download_images(path/file, dest, max_pics=20, max_workers=0)

But that returns the same error.

For further information, I have uploaded all the files, and run all the cells in order (as suggested by Jeremy) and for each bear type.

Folder

Path

Classes.

I use Google Cloud.

I see the error is No such file or directory. But why? As written all cells are run as suggested.

Can anyone please suggest a solution?

Thank you

dipam7 · May 19, 2019, 2:35pm

Hey You need to download the urls, save them in a text file and upload it on your cloud. Check my notebook on Kaggle if it helps.

muellerzr · May 19, 2019, 3:01pm

Also to add on, if it says it can’t find a specific file, one thing I do is a path.ls() to see what exactly is IN that directory I want my file to be at. I am just assuming this could be the case, due to the FileNotFound error

AjayStark · May 19, 2019, 3:56pm

Hi, i tried the same thing, downloaded the urls, saved them as csv files in desktop and uploaded them using the command

from google.colab import files
files.upload()

and when i tried to run the line "download_images(path/file, dest, max_pics=200) "
i was returned with " IsADirectoryError "

What to do?

Thank you.

hrishibhat · May 23, 2019, 1:08pm

Hi. For all working in Google Cloud(Colab). I had the same problem. Follow this. It worked for me.

Firstly, In the video Jeremy uses Jupyter so his path and folders are different.
So google cloud uses drive to store files. Do the following

The first code should be added at the start.

Go to your Gdrive and make sure you have the following folder
‘My Drive/fastai-v3’
If not create a fastai-v3 folder inside My Drive.

Also edit the path as shown above.

Then please change the file extensions as txt files if u have downloaded in txt format.
2Capture

This should start the download. Let me know if it works for you.

AjayStark · May 27, 2019, 6:16pm

Hi, i tried to download by directly specifying the file name (i.e blackbears.txt) and it worked for me.

i have a doubt in opening images part in production.
img = open_image(path/‘blackbears.txt’/‘00000021.jpg’)
In the above code, how do i know there is a file with name “00000021.jpg” and what are the other names?
@hrishibhat

hrishibhat · May 27, 2019, 6:37pm

Are doing it on Google cloud colab or jupyter notebook ?

dreambeats · May 28, 2019, 8:09am

I can recommend a library that does the job quite conveniently. The relevant code is quite short too:
First, pip install google_images_download. Then in Python:

from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
args = {
        'keywords':'your search query,
        'exact_size':"350,350",
        'limit':50,
        'output_directory':'images',
        'silent_mode':True,
        'type':'photo',
       }
p = response.download(args)

Your images will be saved in a directory called ‘images’, and they will be segmented into different folders based on your search query. Feel free to consult me further if you run into problems.

Somy · June 6, 2019, 10:10pm

Hi, I followed your steps but still I get :
FileNotFoundError: [Errno 2] No such file or directory: ’ /content/gdrive/My Drive/fastai-v3/data/bears/grizzly.txt’

What should I do??!

hrishibhat · June 7, 2019, 2:28am

Did u copy the txt file into your drive in the above mentioned folder?

Somy · June 7, 2019, 3:45am

Yed I did. But when I run the line 25, I got the error:
IsADirectoryError: [Errno 21] Is a directory: ‘/content/gdrive/My Drive/fastai-v3/data/black’

???

hrishibhat · June 7, 2019, 4:43am

Could u post screenshot of ur code + your ‘data’ folder in gdrive

tedmasterweb · June 7, 2019, 9:06am

Hi,

I’ve run the example and pruned the input data deleting images that were not of the right type (I am trying to detect buses).

The instructions say to "recreate your ImageDataBunch from your cleaned.csv" but, if I’m not mistaken, I need to change ImageDataBunch.from_folder to .from_csv. I’m wondering if someone can help me with the syntax. I’m reading the documentation but I’m still to new to understand it :-(.

So, given this:

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

How do I convert it to .from_csv()?

data = ImageDataBunch.from_csv(path, csv_labels=path/'cleaned.csv', train=".", valid_pct=0.2,
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

Do I just need to add csv_lables or is there something else?

Many thanks in advance!

Ted Stresen-Reuter

Prowton · June 8, 2019, 11:56am

Addressing FileNotFoundError in lesson 2.

I had this problem but found a way. I use google cloud platform (GCP).
To address the problem,

rename your text files to from teddys.txt to urls_teddys.csv
upload the urls_teddys.csv into the tutorials/fastai/course-v3/nbs/dl1/data/bears folder of your jupyter tutorial notebook
Re-run the download_images(path/file, dest, max_pics=200) command.
extra check your downloaded content in one of the folders in tutorials/fastai/course-v3/nbs/dl1/data/bears

Please I started the course last week and this is my first post. Pardon me for wrong formatting or if I am posting to the inappropriate location.
Cheers.

AjayStark · June 9, 2019, 3:00pm

Hi, i tried to create dataset using datablock api, but it says ImageList has no such atribute ‘transform’

I used this code:
src=(ImageList.from_folder(path)
.split_by_rand_pct(0.2))

data=(src.transform(tfms, size=224).databunch(bs=48).normalize(imagenet_stats))

@Prowton

Prowton · June 9, 2019, 3:51pm

Hi @AjayStark

Thanks for sharing your code snippet. I was able to reproduce the error. The problem was a missing .label_from_folder() call.
Please try
src = (ImageList.from_folder(path).split_by_rand_pct(0.2))

data = (src.label_from_folder().transform(get_transforms(), size=224) .databunch(bs=48).normalize(imagenet_stats))
Hope that helps. Cheers

inspiredu · June 10, 2019, 5:15am

Did you solve your issue?
I have an error exactly like you.
No helhelper here

Prowton · June 10, 2019, 6:50am

Yes I did.
If you use Google cloud platform, I may be able to assist you