TabularList.from_df is too big - anyway to decrease its size?

Hi,
I am using an example from the Lesson 6-Rossmann on a dataset of similar complexity/structure, but of much larger size (3gb csv file). The problem is that I cannot process data as the TabularList becomes too big - exceeds my 16gb of RAM. Are there any settings allowing to decrease its size? I can only think about dropping some of category variables OR reducing their cardinality.

data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs,)
                .split_by_idx(valid_idx)
                .label_from_df(cols=dep_var, label_cls=FloatList, log=True)
                .add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars))
                .databunch())
1 Like

In kaggle I know i’ve seen memory reduction techniques. here is an example of one. See if that helps :slight_smile:

1 Like

Thanks, I tried some of such techniques, but it did not help - once I am making a TabularList object, I hit the RAM limit and the kernel dies.