I am trying to use ImageDataBunch.from_csv() on some downloaded kaggle data. I’m going to try to use lesson 1 to create my model! I’m very excited
The docs say that the file structure needs to look like this:
My question is where do the images themselves go? As it stands, I have seperated the training images into
train\ and the test images into
Also, Kaggle names the training labels file “train.csv”. Do I need to change that to “labels.csv” or is that just a generic file name in the docs?
This is what my directory looks like
all my test images
all my train images
Will this work?
EDIT: I tried the above, and it looks like it is searching for the image files in the
data\ folder when I call
show_batch. So that means that’s where the images go. If that’s the case, what’s the point of having the seperate training and testing folders?
EDIT 2: I see that you should provide the full path to the images in
labels.csv, so that means I need to edit that whole column to have
train\image_0xxx.jpg. I did that, and then I tried
show_batch but it only shows a single label for each image. Since there are multiple classifications, shouldn’t it show them all?
It would really help if you could share your DataBlock API call but let me take a guess and try to help you.
Whatever path your provide as a
Path object is recursively looked through to grab data. In general, you should provide the path to your
train folder which will probably have a sub-folder for each class. Make sure that you have the same folder structure in you
test folder. To answer your question, the point of having seperate folder is that you can create a data bunch from a DataBlock API call by passing your
train data and passing your
test folder in the same call to
add_test_folder. You can use the splitter in the DataBunch to further create a validation set which will be a small chunk of your original
train folder. Now when you do
show_batch, it will only pull data from the
train folder and from the subset which is chosen as train, validation will not be touched.
For the target labels, You can simply load the csv as a dataframe and pass relevant parameters for
labels_from_df in the API call and I don’t think you need to edit the whole column but if you choose to, a simple concatenate call is all you need over the column.
I recommend reading https://docs.fast.ai/data_block.html
Hi @PranY, thank you for the response!
I am simply following lesson 1, step for step, just on a different dataset. I don’t know what a DataBlock is yet, since I’m only on lesson 1. However, I did read in the documents for
ImageDataBunch.from_csv() that the labels colomn is set to 1 by default. Since I have multiple columns, I just needed to pass
label_col=['class1','class2', ...] and everything worked fine.
I’m still new to being able to understand documentation well, but I finally found it
Now, I will dig into reading about data blocks! Thanks so much!