Hi all - I was wondering, does the fastai toolkit randomly choose a different training, validation, and test set for every time you train the neural net, or does it use the same images every time?
When constructing the DataLoader, fastai uses a split function. IIRC, the default is
RandomSplitter, which will randomly divide your data into a train and valid dataset in an 80/20 ratio. Each time you build the DataLoader, the datasets will be randomly constructed, leading to different datasets. You can avoid this by passing a seed to the
RandomSplitter, so the train and valid dataset will remain the same, given the underlying data does not change. There are various other
Splitters, depending on your task, which allows you to define the validation dataset clearly, e.g., by an extra column in your dataframe, putting it into the file name or by folder structure.