I’m going through the course right now and the model training doesn’t seem to take that into account. Is there a reason why the code doesn’t look at the performance on a validation dataset while it’s training?
Take a look at the model data objects in the library. They essentially wrap data loaders for training, validation, and test sets. The split is usually specified by an array of indexes, in terms of what’s in the validation set, but not always. Performance during training is always evaluated on the validation set.
Can you point me to the source code please? Much appreciated!
Depending on your application you have to look up the right file of the fastai library (https://github.com/fastai/fastai/tree/master/fastai).
There are several ways to get your validation dataset (this list is not exhaustive):
- split manually (hand picked, randomly, time based if you use time series data, etc.)
- use the supplied ‘get_cv_idxs’ function
- use train_test_split from sklearn.model_selection (http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)