How do I extract label names from folders?


(Shaimay Shah) #1

A question similar to mine has been asked before but I couldn’t quite understand the proposed solutions.
I’m using this dataset http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz. I’ve managed to uncompress it as well. However, I encounter a problem while using ImageDataBunch.from_name_re. The file system is as follows:

…/images/label/___.jpg

What do I pass in the ImageDataBunch function that allows me to extract these label names from the folder names? In another thread, I saw an explanation on how to do it recursively using glob but I couldn’t quite understand it.

Thanks in advance.


(Benjamin van der Burgh) #2

It’s probably easiest to use the data block API. You can start watching the Fast AI lesson 3 from here, where Jeremy explains its idea.

If you look at the table of contents of the data block API, you can see there are a few discrete steps. In your case, the label is provided as the folder name, so you will probably want to use label_from_folder.

The other steps depend on how you want to load your data and split training and test sets, etc.


#3

You could also try ImageDataBunch.from_folder


(Kieran) #4

Hey Shaimay

You are pretty close! I downloaded the data set and made a data bunch out of it. Here is how I got it to work.

I made path: /CUB_200_2011/CUB_200_2011 (note in my example I have a folder called data where I store all my data sets.)
Then I made a ImageDataBunch.from_folder where path is path/images, size is 224 and valid_pct is 0.2.
I didn’t add tfms, but you could and I didn’t set normalize() to imagenet_stats, but you could.

See my code below.

Hope this helps, I definitely find this part difficult!