That’s not much info to go off of, but you can pass inplace=True to TabularPandas and that should help some. You need to set Pandas chaining_mode to None though first.
Hi, answering my own question, I seem to have found the right way of feeding tabular data that do not fit into memory as one df. That is done by using Callback that resets learn.dls with a new training chunk every epoch.
creating initial dls
# start_df contains validation rows as well as the first chunk of training set
to = TabularPandas(start_df, procs, cat_names, cont_names, y_names="salary", y_block = CategoryBlock(),
splits=splits, do_setup=True)
trn_dl = TabDataLoader(to.train)
val_dl = TabDataLoader(to.valid)
dls = DataLoaders(trn_dl, val_dl).cuda()
Callback
# train_chunk_generator returns next chunk of training data
class ReloadCallback(Callback):
def begin_epoch(self):
df = next(next_chunk)
to_new = to.new(df)
to_new.process()
trn_dl = TabDataLoader(to_new.train)
val_dl = TabDataLoader(to.valid)
self.learn.dls = DataLoaders(trn_dl, val_dl).cuda()