ULMFit in production: backwards model

BjoernB · October 10, 2019, 11:36am

Hi, what is the recommended way to use a backwards model (and thus ensemble for fwd + bwd) in production? For the forward model I have followed the inference tutorial. Thus, I am using add_test and learner.get_preds with ds_type=DatasetType.Test and ordered=True. Looking at learner.data.test_ds.x, I can see how the text is properly tokenized just like during training.

However, if I export the backward learner, and then use add_test, the resulting tokens in learn_bwd.data.test_ds.x are not (yet?) reversed. Moreover, if I look at the bwd learner after training and before the export, I can see that train/valid/single data loaders have backwards set to true. When I load the exported bwd learner, the dls have backwards=False.

Apart from that, there is no “test” dataloader. Thus, I am a bit confused here.
I guess, I chould just reverse the final input tensor with the word IDs, but then there are two things I am worried about:

Do I really get the batching right?
What if I want to change the transformations during training. I have often see better LM scores for the backwards model during LM refinement that didn’t translate to the classifier. I suspect this might be because backwards xxmaj might be too easy to predict. Probably there is not much left to improve but at least I would like to keep the option.

UPDATE:
So far I have discovered that it is as expected that the tokens in the dataset are not yet reversed.

If I understand everything correctly,get_preds will eventually use learn_bwd.dl(). Like this, I see two new problems with my current code:

learn_bwd.dl() uses a collate_fn with backwards=True before the export and backwards=False after the export.
When trying to use an ensemble a pattern like learn_fwd.add_test(my_batch) and then learn_bwd.add_test(my_batch) seems to be bad, because it’d do identical tokenization twice.

UPDATE 2:
I think one might even call this a bug in TextClassifierLearner’s export/load functionality if presented like this:

learn.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([0.9983, 0.0017]))
learn_bwd.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([1.0000e+00, 1.4775e-07]))

learn.export(path/'tmp')
learn_from_export = load_learner(path, 'tmp')
learn_bwd.export(path/'tmp_bwd')
learn_bwd_from_export = load_learner(path, 'tmp_bwd') 

learn_from_export.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([0.9983, 0.0017]))
learn_bwd_from_export.predict("one two three four five")
>>> (Category negative, tensor(0), tensor([9.9999e-01, 1.0735e-05]))

As you can see, the forward leaner is resistant to export->load whereas the backward learner isn’t.
I strongly suspect this is due to:

learn_bwd.dl()
>>> DeviceDataLoader(dl=<torch.utils.data.dataloader.DataLoader object at 0x7f5cf05dc0f0>, device=device(type='cuda', index=0), tfms=[], collate_fn=functools.partial(<function pad_collate at 0x7f5cf21eec80>, pad_idx=1, pad_first=True, backwards=True))

learn_bwd_from_export.dl()
>>> DeviceDataLoader(dl=<torch.utils.data.dataloader.DataLoader object at 0x7f5c8c274a58>, device=device(type='cuda', index=0), tfms=[], collate_fn=functools.partial(<function pad_collate at 0x7f5cf21eec80>, pad_idx=1, pad_first=True, backwards=False))

Where the assignment to backwards in the partial function application is not preserved.

eliasjacob · October 28, 2019, 4:40pm

Hi. I’ve managed to solve make it work passing “backwards=True” inside the load_learner(). See if it works.