In the lesson 4 tabular notebook, is there any particular reason why the validation and test sets reference the same examples?
I’m also curious about this – here’s relevant code for anyone who wants to chime in. We use the same range (indexes 800 - 1000) for both sets.
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names) data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs) .split_by_idx(list(range(800,1000))) .label_from_df(cols=dep_var) .add_test(test) .databunch())
A follow-up question: When we create the test data, we don’t use the processors that we do for the main databunch. Why is that?
I believe it was just shown as an example. And to the second, when we add a test, the same procs are applied and the same mean and standard deviation are applied to the normalization’s. Since we do add_test