When adding a test ItemList via load_learner, why are all the samplers set to "SequentialSampler" for my LM learner?

wgpubs · July 30, 2019, 6:50pm

Attempting to do batch inference using my trained LM for the purpose of getting the document hidden states (e.g. the outputs and raw_outputs produced by the LinearDecoder)

I just noticed that when specifying a test ItemList for load_learner, the sampler for all dataloaders is a SequentialSampler … whereas I was expected train to use SortishSampler and both the validation and test to use SortSampler.

inf_items = TextList.from_df(inf_df, path=LM_PATH, cols=corpus_cols)

inf_learn = load_learner(LM_PATH, f'{pre}export_lm.pkl', test=inf_items)
inf_learn.model = inf_learn.model.to(device)
inf_learn.model = inf_learn.model.eval()

inf_learn.data.train_dl.sampler, inf_learn.data.valid_dl.sampler, inf_learn.data.test_dl.sampler
# (<torch.utils.data.sampler.SequentialSampler at 0x7f1e1f6b2240>,
# <torch.utils.data.sampler.SequentialSampler at 0x7f1e1f6b22e8>,
# <torch.utils.data.sampler.SequentialSampler at 0x7f1e1f6b2438>)

This seems also problematic in <learner>.get_preds because of this line:

if ordered and hasattr(self.dl(ds_type), 'sampler'):

… it will return True in this case even though the examples are already in sequential order, resulting in the unnecessary ordering of what is already ordered.