Bidirectional Code | ULMFiT

Hi

I want to use a bi-directional language model. Setting the bidir parameter to True, I get the following error. Please help!

Code
config = awd_lstm_lm_config.copy()
config['bidir'] = True
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3, config=config, pretrained=False)
learn.lr_find()
learn.recorder.plot(skip_end=15)

Error

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in check_hidden_size(self, hx, expected_hidden_size, msg)
170 # type: (Tensor, Tuple[int, int, int], str) -> None
171 if hx.size() != expected_hidden_size:
–> 172 raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
173
174 def check_forward_args(self, input, hidden, batch_sizes):

RuntimeError: Expected hidden[0] size (2, 1, 1152), got (1, 1, 1152)

2 Likes

Are you using an older version of the library? I remember encountering this when the _one_hidden function was hardcoded to return a tensor of size (1, bs, n_hidden), but this has since been updated.

Don’t think. I pip install fastai daily because I work on Google colab.

[EDIT] I tried pip install fastai==1.0.42. This is breaking a few functions that were working with fastai. How do you know the latest version of the library. Also, version specific code?

Take a look here. https://pypi.org/project/fastai/

Still getting the same error. Tried with pip install fastai==1.0.52

After some digging around, it would appear that although in the Github repo the hardcoding was fixed, when conda installing the stable release of fastai the problem is still there

So to fix this problem you can temporarily monkey patch it this way:

def _one_hidden(self, l:int)->Tensor:
    nh = (self.n_hid if l != self.n_layers - 1 else self.emb_sz) // self.n_dir
    return one_param(self).new(self.n_dir, self.bs, nh).zero_()
learn.model[0]._one_hidden = partial( _one_hidden, learn.model[0])

After which you can LR_find as usual and train your model.

1 Like

Note that you can’ train a bidirectional language model as the targets can’t be shifted 1 to the left and the right at the sane time. The bidir option has been left for classifiers not using any pretrained models.

3 Likes

Oh dear, that was something that completely slipped through my mind. Now that you mention it, it seems painfully obvious why we can’t train a bidirectional LM.

2 Likes

You can train two models: one forward, one backward. Then ensemble the results. That can help to boost performance and increase the robust model in the end.

4 Likes

Thanks for the help everyone!! Will train a backward and a forward model.

The monkey patch to change the dimensions was not working for the LM-classifier, as pointed.

1 Like

Hello
I am trying to train backward model but its giving an error as shown n the screenshot. I tried these steps but it still shows the error.


Kindly help. Thankyou.

Hi,
can you please explain what it means that you left it for classifiers not using any pretrained models?
it’s called awd_lstm_clas_config. when is it used if not in a text classifier with AWD_LSTM arch?

or did you mean the bidir=True option won’t work with ULMFit (language model and then classifier based on that vocab)?