Stuck on images /folders --> ImageDataBunch object

hey everyone! i have used this https://github.com/hardikvasa/google-images-download application to download 1000 images / 20 folders of 50 images each.
in keeping with cricket/baseball theme- :slight_smile:

shell command:
$ googleimagesdownload --keywords “cricket stadiums” -sk ‘melbourne, dharamshale, lucknow ,hyderabad, rajkot,visakhapatnam, dehradun, chennai, nagpur,guwahati, wankhede, chandigarh,bahria town, islamabad barsapara, sylhet international, karachi, ekana, raipur, pallekele, ipl’ -l 50

locations where headings in google-images as city or regional tags.- i think!

i have the photos on google drive linked to colab … just have not been able to progress from there.

the folder names would be my ‘annotations’ (20ea)
and each folders contents would be my ‘images’ (50ea)

data= looping thru the folders and create an array/object of 1000 images marked with its specific geographic tag.

not very strong with python… i am probably well over my head in taking this course but its very interesting.
in fact i am retired. but love to ‘piddle’ with programming :slight_smile:

Since you have arranged your data in folders can you not use the ImageDataBunch.from_folder method in a similar way to that used in lesson2_download.ipynb ?

Hi @douglas ,

It seems you are running into the same problem I have struggled with a couple days ago. My project was less ambitious, as I was working with only 2 categories instead of 20.
If you follow the link below, you will find the steps in went through to successfully load the images. In addition, I was also working in Colab, so hopefully, this will help.

thanks for the advice @raimanu-ds . with new data / i have gotten up to where i call the <ImageDataBunch.from_folder>
here is current link to notebook:

https://colab.research.google.com/drive/1Pzza_0GMokWsBMq5IVLZDc_p6WkiO6n-

tia -for the hand holding :slight_smile:

Hi :slight_smile: :

I have my data in the same format you had. How did you arrange the data into train and valid folder? Is there a method or function in fastai that can do that? Did you do it by hand?

Thanks!!!

Hey @nanote :wave:t3:

I manually arranged my data into train and valid folders, as I don’t know if such methods or functions are already implemented in Fastai. Maybe @jeremy or @sgugger could tell us :angel:t3:

Nevertheless I will try to write a Python script (to practice writing code :laughing:) as I have already started compiling more data for my project and it will be tedious once I start adding more categories.

@douglas, I have sent you a request to be able to access your Colab nb.

Thanks @raimanu-ds!

Please @jeremy or @sgugger let us know. Otherwise I’ll create a script.

I also requested access to you nb @douglas :slight_smile:

@nanote I think these ‘from_’ functions have an argument called valid_pct to create a validation set from the train folder.

In general, you should use the data block API if you want more flexibility to assemble your data. The factory methods of ImageDataBunch are great when you begin, but they won’t fit with all the situations you can be in.

Thanks for the advice. Now that you mentioned the data block API, I remember seeing tweets about this; and more specifically tweets mentioning this article which explains it works.