Going through the lesson 4 tabular part I see the following lines are defining 200 lines (from 800 to 1000) as the validation set:
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names) data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs) .split_by_idx(list(range(800,1000))) .label_from_df(cols=dep_var) .add_test(test) .databunch())
The adult.csv file has 32561 rows (+ header) so 200 lines as the validation set is around 0.6% of the data. Is that a meaningful enough percentage? In previous lessons, the validation set is typically 10%-20% of our data. Is the logic different for tabular data sets?
Thanks a lot