Looking at the source code for TextLMDataBunch, it seems like all the datasets get used for training but I’m not certain. Thus, if I have the following code:
data_lm = TextDataBunch.from_df(path, train_df=df_trn, valid_df=df_val, test_df=test_df_fixed, text_cols=['question1', 'question2'], label_cols=['is_duplicate'])
data_lm = TextLMDataBunch.load(path, 'tmp', bs=bs)
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103_1, drop_mult=0.5)
Is it safe to assume that the text within test_df is being used to train the language model along with df_trn and df_val?