How is vocab constructed, precisely?

(Marcin Kowalski) #1

Hi.

In the code below, does split between training and validation sets influence effective vocab? In other words - is vocab built based on train AND validation sets or train only?

data_lm = TextLMDataBunch.from_df(train_df=df_trn, valid_df=df_val, path="", text_cols=['text'], label_cols=['label'], bs=64)

Thanks!

0 Likes