I understand the role of training, test, and validation sets, and I have read Rachel’s blog post on the topic. However, I am confused with how fast.ai is treating the “train” and “valid” directories in the Cats and Dogs example in lesson 1. Am I correct in thinking the following?
- The model is trained on images in the “train” directory and its performance is measured against the images in the “valid” directory
- No cross-validation is implemented, so the model has never trained before on images in the “valid” directory
- If I want to use 5-fold cross-validation and check for overfitting, I will have to create separate “train” and “test” directories, and within “train” implement cross-validation by shuffling 4 train sets and 1 valid set.
My training set is non-temporal, and consists of different malware and benign files, so it looks like a good use case for cross-validation.