Split data into Train, Test and Validation Data

jayrodge · June 6, 2019, 6:57pm

I have all the image in a folder and it’s labels in the form of label.csv [file_name, multi-labels]. I want to split this data into train, test, and validation.

I used datablock api, but it only splits into train and validation, for which I used the following code:

    data = (ImageList.from_df(labels[['file', 'label']].dropna(), path)
           .split_by_rand_pct(0.2)
           .label_from_df(label_delim=';')
           .transform(tfms, size=128)
           .databunch())

How do I split the same data into Train, Test, and Validation sets?
Is there an existing function/parameter to this?

Thanks in advance

Pomo · June 6, 2019, 7:48pm

Hi Jay. Typically an unlabelled Test set is provided by the machine leaning practitioner, or by the problem statement, or by the competition. Please see here for options for adding a Test set to the DataBunch:

If you want to split out your own Test set, IMHO the easiest way is to learn how to use DataFrames (Pandas). With DataFrames you can load in the csv, take a sample, write a new CSV, move files, etc. Then fastai will split the remaining set into Training and Validation.