Bidirectional Code | ULMFiT

shruti_01 · June 2, 2019, 9:59am

Hi

I want to use a bi-directional language model. Setting the bidir parameter to True, I get the following error. Please help!

Code
config = awd_lstm_lm_config.copy()
config['bidir'] = True
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3, config=config, pretrained=False)
learn.lr_find()
learn.recorder.plot(skip_end=15)

Error

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in check_hidden_size(self, hx, expected_hidden_size, msg)
170 # type: (Tensor, Tuple[int, int, int], str) -> None
171 if hx.size() != expected_hidden_size:
–> 172 raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
173
174 def check_forward_args(self, input, hidden, batch_sizes):

RuntimeError: Expected hidden[0] size (2, 1, 1152), got (1, 1, 1152)

KarlH · June 3, 2019, 1:19am

Are you using an older version of the library? I remember encountering this when the _one_hidden function was hardcoded to return a tensor of size (1, bs, n_hidden), but this has since been updated.

shruti_01 · June 3, 2019, 1:30am

Don’t think. I pip install fastai daily because I work on Google colab.

[EDIT] I tried pip install fastai==1.0.42. This is breaking a few functions that were working with fastai. How do you know the latest version of the library. Also, version specific code?

dreambeats · June 3, 2019, 5:31am

Take a look here. https://pypi.org/project/fastai/

shruti_01 · June 3, 2019, 6:11am

Still getting the same error. Tried with pip install fastai==1.0.52

dreambeats · June 3, 2019, 8:53am

After some digging around, it would appear that although in the Github repo the hardcoding was fixed, when conda installing the stable release of fastai the problem is still there

So to fix this problem you can temporarily monkey patch it this way:

def _one_hidden(self, l:int)->Tensor:
    nh = (self.n_hid if l != self.n_layers - 1 else self.emb_sz) // self.n_dir
    return one_param(self).new(self.n_dir, self.bs, nh).zero_()
learn.model[0]._one_hidden = partial( _one_hidden, learn.model[0])

After which you can LR_find as usual and train your model.

sgugger · June 6, 2019, 1:07pm

Note that you can’ train a bidirectional language model as the targets can’t be shifted 1 to the left and the right at the sane time. The bidir option has been left for classifiers not using any pretrained models.

dreambeats · June 6, 2019, 7:22pm

Oh dear, that was something that completely slipped through my mind. Now that you mention it, it seems painfully obvious why we can’t train a bidirectional LM.

bfarzin · June 6, 2019, 8:18pm

You can train two models: one forward, one backward. Then ensemble the results. That can help to boost performance and increase the robust model in the end.

shruti_01 · June 7, 2019, 12:01am

Thanks for the help everyone!! Will train a backward and a forward model.

The monkey patch to change the dimensions was not working for the LM-classifier, as pointed.

ulmfitter · January 22, 2020, 8:49am

Hello
I am trying to train backward model but its giving an error as shown n the screenshot. I tried these steps but it still shows the error.

Kindly help. Thankyou.

Naama_A · December 8, 2020, 8:48pm

Hi,
can you please explain what it means that you left it for classifiers not using any pretrained models?
it’s called awd_lstm_clas_config. when is it used if not in a text classifier with AWD_LSTM arch?

or did you mean the bidir=True option won’t work with ULMFit (language model and then classifier based on that vocab)?