Split data into Train, Test and Validation Data

Hi Jay. Typically an unlabelled Test set is provided by the machine leaning practitioner, or by the problem statement, or by the competition. Please see here for options for adding a Test set to the DataBunch:

If you want to split out your own Test set, IMHO the easiest way is to learn how to use DataFrames (Pandas). With DataFrames you can load in the csv, take a sample, write a new CSV, move files, etc. Then fastai will split the remaining set into Training and Validation.

2 Likes