Concatenating two textlists/databunches

(Henry Dashwood) #1

Hello,

I’ve been trying to make the following code run:

data_lm = (TextList.from_df(df=df_train, cols=['body', 'source', 'headline'])
               .split_by_rand_pct(0.1)
               .label_for_lm()           
               .databunch(bs=256, bptt=80, num_workers=0))

Unfortunately it’s very slow at processing my complete dataset (29,000 rows). It is very quick for anything below 10,000 rows though. Is it possible for me to create 2 or 3 smaller datasets and then to concatenate them together into one big TextList/Databunch?

0 Likes

Training a AWD-LSTM with tuples of text