How is fastai implementing training, test, and validation sets?

niketb · November 23, 2018, 1:56am

I understand the role of training, test, and validation sets, and I have read Rachel’s blog post on the topic. However, I am confused with how fast.ai is treating the “train” and “valid” directories in the Cats and Dogs example in lesson 1. Am I correct in thinking the following?

The model is trained on images in the “train” directory and its performance is measured against the images in the “valid” directory
No cross-validation is implemented, so the model has never trained before on images in the “valid” directory
If I want to use 5-fold cross-validation and check for overfitting, I will have to create separate “train” and “test” directories, and within “train” implement cross-validation by shuffling 4 train sets and 1 valid set.

My training set is non-temporal, and consists of different malware and benign files, so it looks like a good use case for cross-validation.

nok · November 23, 2018, 3:43pm

I think you can use split by idx, but you should put your files in one folder in this case. Simply create 5 fold idxs