Bidirectional Code | ULMFiT

(Shruti Mittal) #1

Hi

I want to use a bi-directional language model. Setting the bidir parameter to True, I get the following error. Please help!

Code
config = awd_lstm_lm_config.copy()
config['bidir'] = True
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3, config=config, pretrained=False)
learn.lr_find()
learn.recorder.plot(skip_end=15)

Error

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in check_hidden_size(self, hx, expected_hidden_size, msg)
170 # type: (Tensor, Tuple[int, int, int], str) -> None
171 if hx.size() != expected_hidden_size:
–> 172 raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
173
174 def check_forward_args(self, input, hidden, batch_sizes):

RuntimeError: Expected hidden[0] size (2, 1, 1152), got (1, 1, 1152)

2 Likes

(Karl) #3

Are you using an older version of the library? I remember encountering this when the _one_hidden function was hardcoded to return a tensor of size (1, bs, n_hidden), but this has since been updated.

0 Likes

(Shruti Mittal) #4

Don’t think. I pip install fastai daily because I work on Google colab.

[EDIT] I tried pip install fastai==1.0.42. This is breaking a few functions that were working with fastai. How do you know the latest version of the library. Also, version specific code?

0 Likes

(JamesT) #5

Take a look here. https://pypi.org/project/fastai/

0 Likes

(Shruti Mittal) #6

Still getting the same error. Tried with pip install fastai==1.0.52

0 Likes

(JamesT) #7

After some digging around, it would appear that although in the Github repo the hardcoding was fixed, when conda installing the stable release of fastai the problem is still there

So to fix this problem you can temporarily monkey patch it this way:

def _one_hidden(self, l:int)->Tensor:
    nh = (self.n_hid if l != self.n_layers - 1 else self.emb_sz) // self.n_dir
    return one_param(self).new(self.n_dir, self.bs, nh).zero_()
learn.model[0]._one_hidden = partial( _one_hidden, learn.model[0])

After which you can LR_find as usual and train your model.

1 Like

Correct way to use bidirectional awd lstm language model learner?
#8

Note that you can’ train a bidirectional language model as the targets can’t be shifted 1 to the left and the right at the sane time. The bidir option has been left for classifiers not using any pretrained models.

3 Likes

(JamesT) #9

Oh dear, that was something that completely slipped through my mind. Now that you mention it, it seems painfully obvious why we can’t train a bidirectional LM.

1 Like

Problem with lr_find when using both bidirectional and qrnn
(Bobak Farzin) #10

You can train two models: one forward, one backward. Then ensemble the results. That can help to boost performance and increase the robust model in the end.

3 Likes

(Shruti Mittal) #11

Thanks for the help everyone!! Will train a backward and a forward model.

The monkey patch to change the dimensions was not working for the LM-classifier, as pointed.

1 Like