The purpose of Sample folder: Is my understanding correct?


Please let me know if my below statements are correct:
1 Training Folder: To store the training data in their appropriate labeled folders. Which then is used to train the model to learn the weights and are vital to the success of the model.
2 Validation Folder: To store the validation data again in their appropriate labeled folders exactly similar to training folder. Used for validating the weights created by the training data and is very vital.
3 Test Folder: The test data stored in this folder under unknown labeled folder. Used for predicting/classifying the unlabeled data using the weights created with the help of training data.
4 Sample Folder: A fraction of the data from Training folder is copied for purpose of coding the network and test the code.

I think i might have got some of the terms/keywords incorrect.

Thank You.

Yes, you are right. The reason we want to have the sample folder is that, in the case of the dogs vs cats dataset, too train the entire VGG16 model on all the training data takes around 6 minutes for a single epoch! Not that you probably would want to train the VGG16 from scratch, but even if a full epoch is in the tens of seconds, it already makes it hard to iterate and test ideas quickly.

So what we do is we take a sample of the train data and train our model on that. We use the sample to try ideas out quickly and all we care is for the sample to be big enough so that the insights we gain from working with the sample will generalize to using all the data.

I think what you wrote is correct!

Thank you for your reply Radek. Going forward I will code and test against the sample folder. And when I have to actually get the weights, I have to use the training and validation data.
In few of the examples, like lesson 1, 2, and 3 notebooks, i did observe Jeremy utilizing the training and validation data. But in the batch normalization example, the training and validation accuracy that was discussed were of sample data.

Thank you.

Yes you are correct. Creating a sample folder saves time while checking correcting the network and other parameters.

from where do you get the validation data?

if it is from kaggle, it should be part of the download. Or you can create your own by getting a small sample from the training set.