Hi, what is the recommended way to use a backwards model (and thus ensemble for fwd + bwd) in production? For the forward model I have followed the inference tutorial. Thus, I am using add_test
and learner.get_preds
with ds_type=DatasetType.Test
and ordered=True
. Looking at learner.data.test_ds.x
, I can see how the text is properly tokenized just like during training.
However, if I export the backward learner, and then use add_test
, the resulting tokens in learn_bwd.data.test_ds.x
are not (yet?) reversed. Moreover, if I look at the bwd learner after training and before the export, I can see that train/valid/single data loaders have backwards
set to true. When I load the exported bwd learner, the dls have backwards=False
.
Apart from that, there is no “test” dataloader. Thus, I am a bit confused here.
I guess, I chould just reverse the final input tensor with the word IDs, but then there are two things I am worried about:
- Do I really get the batching right?
- What if I want to change the transformations during training. I have often see better LM scores for the backwards model during LM refinement that didn’t translate to the classifier. I suspect this might be because backwards
xxmaj
might be too easy to predict. Probably there is not much left to improve but at least I would like to keep the option.
UPDATE:
So far I have discovered that it is as expected that the tokens in the dataset are not yet reversed.
If I understand everything correctly,get_preds
will eventually use learn_bwd.dl()
. Like this, I see two new problems with my current code:
-
learn_bwd.dl()
uses a collate_fn withbackwards=True
before the export andbackwards=False
after the export. - When trying to use an ensemble a pattern like
learn_fwd.add_test(my_batch)
and thenlearn_bwd.add_test(my_batch)
seems to be bad, because it’d do identical tokenization twice.
UPDATE 2:
I think one might even call this a bug in TextClassifierLearner’s export/load functionality if presented like this:
learn.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([0.9983, 0.0017]))
learn_bwd.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([1.0000e+00, 1.4775e-07]))
learn.export(path/'tmp')
learn_from_export = load_learner(path, 'tmp')
learn_bwd.export(path/'tmp_bwd')
learn_bwd_from_export = load_learner(path, 'tmp_bwd')
learn_from_export.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([0.9983, 0.0017]))
learn_bwd_from_export.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([9.9999e-01, 1.0735e-05]))
As you can see, the forward leaner is resistant to export->load whereas the backward learner isn’t.
I strongly suspect this is due to:
learn_bwd.dl()
>>> DeviceDataLoader(dl=<torch.utils.data.dataloader.DataLoader object at 0x7f5cf05dc0f0>, device=device(type='cuda', index=0), tfms=[], collate_fn=functools.partial(<function pad_collate at 0x7f5cf21eec80>, pad_idx=1, pad_first=True, backwards=True))
learn_bwd_from_export.dl()
>>> DeviceDataLoader(dl=<torch.utils.data.dataloader.DataLoader object at 0x7f5c8c274a58>, device=device(type='cuda', index=0), tfms=[], collate_fn=functools.partial(<function pad_collate at 0x7f5cf21eec80>, pad_idx=1, pad_first=True, backwards=False))
Where the assignment to backwards in the partial function application is not preserved.