I’ve been trying to make the following code run:
data_lm = (TextList.from_df(df=df_train, cols=['body', 'source', 'headline']) .split_by_rand_pct(0.1) .label_for_lm() .databunch(bs=256, bptt=80, num_workers=0))
Unfortunately it’s very slow at processing my complete dataset (29,000 rows). It is very quick for anything below 10,000 rows though. Is it possible for me to create 2 or 3 smaller datasets and then to concatenate them together into one big TextList/Databunch?