How is fastai implementing training, test, and validation sets?


(Niket Bhodia) #1

I understand the role of training, test, and validation sets, and I have read Rachel’s blog post on the topic. However, I am confused with how fast.ai is treating the “train” and “valid” directories in the Cats and Dogs example in lesson 1. Am I correct in thinking the following?

  1. The model is trained on images in the “train” directory and its performance is measured against the images in the “valid” directory
  2. No cross-validation is implemented, so the model has never trained before on images in the “valid” directory
  3. If I want to use 5-fold cross-validation and check for overfitting, I will have to create separate “train” and “test” directories, and within “train” implement cross-validation by shuffling 4 train sets and 1 valid set.

My training set is non-temporal, and consists of different malware and benign files, so it looks like a good use case for cross-validation.


(nok) #2

I think you can use split by idx, but you should put your files in one folder in this case. Simply create 5 fold idxs