Load train and Validation set from different csv files

jainds · January 21, 2019, 9:31am

I have two csv files and two folders. One folder contains my images for training set and another for validation set, I have respective csv files for the images and labels in each folder.

I have tried to create ImageItem List for each csv, but I do not know how to set the dataset type for each item list to train and valid respectively.

Can you please help me tackle this problem without moving all the images into a single folder?

Thanks!

edwardjross · January 21, 2019, 9:57am

How about read in both CSV files into a Pandas dataframe, add a valid column with the value true for the validation set and false for the training set, then merge them into a single dataframe df?

You also may need to adjust the paths with the folder they are in so the filenames are relative to the current directory.

Then you can use the datablock api; something like:

ImageItemList.\
from_df(df).\
split_from_df(col='valid').\
label_from_df().\
transform(get_transforms()).\
databunch().\
normalize(imagenet_stats)

jainds · January 21, 2019, 10:29am

Thank you very much for the solution, it worked. I was wondering can we similarly add a test set from a csv/df file?

edwardjross · January 21, 2019, 9:55pm

Good question; data_block doesn’t seem as good for adding a test set: https://docs.fast.ai/data_block.html#Add-a-test-set.

I haven’t actually tried this (I don’t have the right code handy) but it looks like you could load test csv into test_df and do something like:

...
label_from_df().\
add_test(ImageItemList.from_df(test_df)).\
transform(get_transforms()).\
...

Let us know whether this works

jainds · January 22, 2019, 7:53am

I tried, It didn’t work. I got an error: IndexingError: Too many indexers