Correct way to use bidirectional awd lstm language model learner?

Is there an official way to implement bidirectional AWD LSTM? I used the following code but the error pops out.

config = awd_lstm_lm_config.copy()
config['bidir'] = True
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3, model_dir=".", config=config, pretrained=False)
learn.lr_find()
learn.recorder.plot(skip_end=15)

The error said the size should be double, does that mean I have to modify my code during creation of databunch?

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/fastai/text/models/awd_lstm.py in forward(self, input, from_embeddings)
    112         new_hidden,raw_outputs,outputs = [],[],[]
    113         for l, (rnn,hid_dp) in enumerate(zip(self.rnns, self.hidden_dps)):
--> 114             raw_output, new_h = rnn(raw_output, self.hidden[l])
    115             new_hidden.append(new_h)
    116             raw_outputs.append(raw_output)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/fastai/text/models/awd_lstm.py in forward(self, *args)
     47             #To avoid the warning that comes because the weights aren't flattened.
     48             warnings.simplefilter("ignore")
---> 49             return self.module.forward(*args)
     50 
     51     def reset(self):

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    173                 hx = (hx, hx)
    174 
--> 175         self.check_forward_args(input, hx, batch_sizes)
    176         _impl = _rnn_impls[self.mode]
    177         if batch_sizes is None:

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
    150         if self.mode == 'LSTM':
    151             check_hidden_size(hidden[0], expected_hidden_size,
--> 152                               'Expected hidden[0] size {}, got {}')
    153             check_hidden_size(hidden[1], expected_hidden_size,
    154                               'Expected hidden[1] size {}, got {}')

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_hidden_size(hx, expected_hidden_size, msg)
    146         def check_hidden_size(hx, expected_hidden_size, msg='Expected hidden size {}, got {}'):
    147             if tuple(hx.size()) != expected_hidden_size:
--> 148                 raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
    149 
    150         if self.mode == 'LSTM':

RuntimeError: Expected hidden[0] size (2, 32, 575), got (1, 32, 575)

Also stumbled upon this. Is there any proper sample code to use bidir AWD_LSTM?

@kachun1017 Hey did you get this working? I am also getting the same error!

sorry man, I couldn’t.
But I got the transformerXL working and switched to it.

Nice, how did you choose the hyperparameters for the transformerXL model?

Bdw, woman here :slight_smile:

If you’re in Part 2, you can see the solution here. If not, the solution per @dreambeats is to monkey patch _one_hidden, which is broken in the latest release.

def _one_hidden(self, l:int)->Tensor:
    nh = (self.n_hid if l != self.n_layers - 1 else self.emb_sz) // self.n_dir
    return one_param(self).new(self.n_dir, self.bs, nh).zero_()
learn.model[0]._one_hidden = partial( _one_hidden, learn.model[0])

Thanks a lot! you are my life saver!

Hi there, the code works when I am training a language model, but not on classifier, and I really can’t notice the problem. Do you guys know why? Thanks a lot.

config = awd_lstm_clas_config.copy()
config['bidir'] = True
learn = text_classifier_learner(data_clas, AWD_LSTM, config=config, drop_mult=0.5, wd=1e-2, pretrained=False).to_fp16()

def _one_hidden(self, l:int)->Tensor:
    nh = (self.n_hid if l != self.n_layers - 1 else self.emb_sz) // self.n_dir
    return one_param(self).new(self.n_dir, self.bs, nh).zero_()
learn.model[0]._one_hidden = partial( _one_hidden, learn.model[0])

learn.load_encoder('fine_tuned_enc')
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))

RuntimeError: Expected hidden[0] size (2, 64, 575), got (1, 64, 575)


Take a look at this, this is what you should do instead.

2 Likes

Hy @dreambeats GitHub link is not working, It will be very useful if you can share the code here.

Hi,

I believe if you want to train a bidirectional AWD-LSTM, then you just have to update the value in the dictionary that you pass into the learner function:

config_lm = awd_lstm_lm_config.copy()
config_lm.update({'bidir': True})
learn_bidir = language_model_learner(dls_classify, AWD_LSTM, config=config_lm, pretrained=False,                                
                                drop_mult=0.5, metrics=accuracy).to_fp16()

However, implementing a Bidirectional LM is not possible, as the model is then able to see the output it should try to predict, meaning it cannot obtain meaningful parameters. When a bidirection LM cannot be trained, you need a lot of data to apply a bidirectional classifier, as you cannot utilize transfer learning.

You can better train two separate classifiers: one that reads the dataloader from left-to-right, and another that reads from right-to-left and average the predictions.

Hope this helps

1 Like

Thanks @sebastiaan for your kind reply,
I am using it for NLP binary classification (0, 1)
what I am doing is

config= awd_lstm_lm_config.copy()
config['bidir'] = True
learn = language_model_learner(data_lm, arch=AWD_LSTM, pretrained = False, drop_mult=0.5, config=config )
learn.fit_one_cycle(1, 1e-2)
learn.save_encoder('ft_enc')



config= awd_lstm_clas_config.copy()
config['bidir'] = True
learn = text_classifier_learner(data_clas, arch=AWD_LSTM, drop_mult=0.7, config=config)
learn.load_encoder('ft_enc')
learn.fit_one_cycle(1, 1e-2)

If you can elaborate a bit more, that’ll be great.

Maybe try to give each dictionary a different name, maybe it somehow references to the wrong dictionary?

And have you referred to the vocab of data_lm when defining data_clas?

Thanks for suggestion,
Yes, I referred the vocab of train. But still didn’t get the required results.

# Language model data

data_lm = TextLMDataBunch.from_df(train_df = df_trn, valid_df = df_val, path = "")

# Classifier model data

data_clas = TextClasDataBunch.from_df(path = "", train_df = df_trn, valid_df = df_val, vocab=data_lm.train_ds.vocab, bs=32)

bidirectional training of AWD-LSTM is not supported. But you can achieve better results when training a forward and backwards model and use an ensemble of both during inference / prediction.

Here’s an example for the ensemble … you can also find the training notebooks in the repo.

1 Like