Hi,
I am using an example from the Lesson 6-Rossmann on a dataset of similar complexity/structure, but of much larger size (3gb csv file). The problem is that I cannot process data as the TabularList becomes too big - exceeds my 16gb of RAM. Are there any settings allowing to decrease its size? I can only think about dropping some of category variables OR reducing their cardinality.
data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs,)
.split_by_idx(valid_idx)
.label_from_df(cols=dep_var, label_cls=FloatList, log=True)
.add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars))
.databunch())