Clarification on Training/Validation/Testing Sets

Teth-Adam · April 21, 2021, 6:42pm

Hi all - I was wondering, does the fastai toolkit randomly choose a different training, validation, and test set for every time you train the neural net, or does it use the same images every time?

BresNet · April 22, 2021, 7:49am

When constructing the DataLoader, fastai uses a split function. IIRC, the default is RandomSplitter, which will randomly divide your data into a train and valid dataset in an 80/20 ratio. Each time you build the DataLoader, the datasets will be randomly constructed, leading to different datasets. You can avoid this by passing a seed to the RandomSplitter, so the train and valid dataset will remain the same, given the underlying data does not change. There are various other Splitters, depending on your task, which allows you to define the validation dataset clearly, e.g., by an extra column in your dataframe, putting it into the file name or by folder structure.

BobMcDear · April 23, 2021, 3:10pm

To elaborate on what BresNet said, fastai never creates you a test set, and you have to do so manually if you’d like. Please take a look at these two for more information.