Labels from folder, validation from csv?

greta · June 28, 2019, 12:49pm

Hi, I have a bunch of images that I just want to do basic classification on. The images are in separate folders according to label (as in Lesson 1 of the course). Those folders are contained in a parent folder. However, now I’d like to tell the model which images to use as a validation set. I have a csv file (also in the parent folder) with names of all the images that I wish to use as a validation set (the list contains a bunch of file names from each of the categories). I can’t find a way to do get labels from folders and the validation split from a csv (or txt or whatever). The documentation doesn’t mention much about the .split_from_whatever() methods (even though they are used in Lesson 3 of the course). The data block API thing is completely incomprehensible to me and I don’t understand what goes with what on that webpage…

Here is what I’m trying:

data = (ItemList.from_folder(path)
.split_by_fname_file(path + ‘valid.csv’)
.label_from_folder()
.databunch(bs=bs)
.normalize(imagenet_stats))

I really need to do this for work. Please use layman language, I’m a complete noob.

kushaj · June 28, 2019, 8:10pm

When all things fail just create a csv file with all the filenames and a column with valid=True for your validation images. I am suggesting you to use split_by_idxs and from_csv methods.

greta · July 1, 2019, 12:12pm

Thank you for your answer. Can I please just clarify: how do I then indicate the labels? Should I still split my images into folders by label or do I need to say what labels they are in the csv file somehow? And if the latter, how do I store these labels? Thanks for your time!

kushaj · July 1, 2019, 5:12pm

The label would be the second column of the csv table. Look at the docs. No you don’t need to split your images. If the third column has valid=True then that image would be included in the validation set and otherwise not.