A question similar to mine has been asked before but I couldn’t quite understand the proposed solutions.
I’m using this dataset http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz. I’ve managed to uncompress it as well. However, I encounter a problem while using ImageDataBunch.from_name_re. The file system is as follows:
…/images/label/___.jpg
What do I pass in the ImageDataBunch function that allows me to extract these label names from the folder names? In another thread, I saw an explanation on how to do it recursively using glob but I couldn’t quite understand it.
It’s probably easiest to use the data block API. You can start watching the Fast AI lesson 3 from here, where Jeremy explains its idea.
If you look at the table of contents of the data block API, you can see there are a few discrete steps. In your case, the label is provided as the folder name, so you will probably want to use label_from_folder.
The other steps depend on how you want to load your data and split training and test sets, etc.
You are pretty close! I downloaded the data set and made a data bunch out of it. Here is how I got it to work.
I made path: /CUB_200_2011/CUB_200_2011 (note in my example I have a folder called data where I store all my data sets.)
Then I made a ImageDataBunch.from_folder where path is path/images, size is 224 and valid_pct is 0.2.
I didn’t add tfms, but you could and I didn’t set normalize() to imagenet_stats, but you could.