Why are test and validation sets the same in lesson 4 tabular notebook?

wgpubs · November 25, 2018, 12:07am

In the lesson 4 tabular notebook, is there any particular reason why the validation and test sets reference the same examples?

mmarvick · July 22, 2019, 3:15pm

I’m also curious about this – here’s relevant code for anyone who wants to chime in. We use the same range (indexes 800 - 1000) for both sets.

test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)

data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(list(range(800,1000)))
                           .label_from_df(cols=dep_var)
                           .add_test(test)
                           .databunch())

mmarvick · July 22, 2019, 3:17pm

A follow-up question: When we create the test data, we don’t use the processors that we do for the main databunch. Why is that?

muellerzr · July 22, 2019, 3:32pm

I believe it was just shown as an example. And to the second, when we add a test, the same procs are applied and the same mean and standard deviation are applied to the normalization’s. Since we do add_test