Attempting to do batch inference using my trained LM for the purpose of getting the document hidden states (e.g. the outputs
and raw_outputs
produced by the LinearDecoder
)
I just noticed that when specifying a test
ItemList for load_learner
, the sampler for all dataloaders is a SequentialSampler
… whereas I was expected train to use SortishSampler
and both the validation and test to use SortSampler
.
inf_items = TextList.from_df(inf_df, path=LM_PATH, cols=corpus_cols)
inf_learn = load_learner(LM_PATH, f'{pre}export_lm.pkl', test=inf_items)
inf_learn.model = inf_learn.model.to(device)
inf_learn.model = inf_learn.model.eval()
inf_learn.data.train_dl.sampler, inf_learn.data.valid_dl.sampler, inf_learn.data.test_dl.sampler
# (<torch.utils.data.sampler.SequentialSampler at 0x7f1e1f6b2240>,
# <torch.utils.data.sampler.SequentialSampler at 0x7f1e1f6b22e8>,
# <torch.utils.data.sampler.SequentialSampler at 0x7f1e1f6b2438>)
This seems also problematic in <learner>.get_preds
because of this line:
if ordered and hasattr(self.dl(ds_type), 'sampler'):
… it will return True
in this case even though the examples are already in sequential order, resulting in the unnecessary ordering of what is already ordered.