Does fast.ai split training data into training/validation?

maxmatical · June 22, 2018, 3:29pm

I’m going through the course right now and the model training doesn’t seem to take that into account. Is there a reason why the code doesn’t look at the performance on a validation dataset while it’s training?

Even · June 22, 2018, 6:00pm

Take a look at the model data objects in the library. They essentially wrap data loaders for training, validation, and test sets. The split is usually specified by an array of indexes, in terms of what’s in the validation set, but not always. Performance during training is always evaluated on the validation set.

maxmatical · June 22, 2018, 7:37pm

Can you point me to the source code please? Much appreciated!

MicPie · June 24, 2018, 7:22am

Hey @maxmatical

Depending on your application you have to look up the right file of the fastai library (https://github.com/fastai/fastai/tree/master/fastai).

There are several ways to get your validation dataset (this list is not exhaustive):

split manually (hand picked, randomly, time based if you use time series data, etc.)
use the supplied ‘get_cv_idxs’ function
use train_test_split from sklearn.model_selection (http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

Best regards
Michael