Problem with lr_find when using both bidirectional and qrnn

#1

Hi as describe, I’m facing the following error when enabling both bidirectional and qrnn in the awd_lstm_lm_config file.
RuntimeError: The expanded size of the tensor (1150) must match the existing size (575) at non-singleton dimension 1. Target sizes: [64, 1150]. Tensor sizes: [64, 575]
all the other parameters are default parameters

Does fastai support bidirectional qrnn?

0 Likes

#2

This bug was recently fixed in master. Note that, as explained in this topic, it is impossible to train a bidirectional language model. You can train a bidirectional classifier on the other hand.

2 Likes

#3

Thanks for the response. When I click on the link you mentioned, I’m getting ‘Sorry, you don’t have access to that topic!’.
If possible I would like to understand why it’s impossible to create a bidirectional language model. I also have a few more questions
1)Would bi_directional qrnn work with language classifier?
2) But a language model with bi_directional LSTM gives no errors. So is the error specific to qrnn?

0 Likes

#4

Oh sorry it’s not public yet, will be when the latest part 2 of the course is released as a MOOC. A language model has for targets the input shifted by one word, on the left for a forward model, on the right for a backward model. You can’t have both at the same time, hence no bidirectional LM. Note that there won’t be any bug, the code can run, but you won’t train properly.

For the second point, it means it’s not the same error as before. Could you give the full stack trace?

0 Likes

#5

RuntimeError Traceback (most recent call last)
in ()
----> 1 learn.lr_find()

~/punchh/sensus/notebooks/chetan/fastai/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, wd)
30 cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
31 epochs = int(np.ceil(num_it/len(learn.data.train_dl)))
—> 32 learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
33
34 def to_fp16(learn:Learner, loss_scale:float=None, max_noskip:int=1000, dynamic:bool=True, clip:float=None,

~/punchh/sensus/notebooks/chetan/fastai/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
198 callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
199 if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
–> 200 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
201
202 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/punchh/sensus/notebooks/chetan/fastai/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
99 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
100 xb, yb = cb_handler.on_batch_begin(xb, yb)
–> 101 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
102 if cb_handler.on_batch_end(loss): break
103

~/punchh/sensus/notebooks/chetan/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
24 if not is_listy(xb): xb = [xb]
25 if not is_listy(yb): yb = [yb]
—> 26 out = model(*xb)
27 out = cb_handler.on_loss_begin(out)
28

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input
94

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/awd_lstm.py in forward(self, input, from_embeddings)
116 new_hidden,raw_outputs,outputs = [],[],[]
117 for l, (rnn,hid_dp) in enumerate(zip(self.rnns, self.hidden_dps)):
–> 118 raw_output, new_h = rnn(raw_output, self.hidden[l])
119 new_hidden.append(new_h)
120 raw_outputs.append(raw_output)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/qrnn.py in forward(self, inp, hid)
200 if self.bidirectional: inp_bwd = inp.clone()
201 for i, layer in enumerate(self.layers):
–> 202 inp, h = layer(inp, None if hid is None else hid[2*i if self.bidirectional else i])
203 new_hid.append(h)
204 if self.bidirectional:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/qrnn.py in forward(self, inp, hid)
148 if self.backward: forget_mult = dispatch_cuda(BwdForgetMultGPU, partial(forget_mult_CPU, backward=True), inp)
149 else: forget_mult = dispatch_cuda(ForgetMultGPU, forget_mult_CPU, inp)
–> 150 c_gate = forget_mult(z_gate, f_gate, hid, self.batch_first)
151 output = torch.sigmoid(o_gate) * c_gate if self.output_gate else c_gate
152 if self.window > 1 and self.save_prev_x:

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/qrnn.py in forward(ctx, x, f, hidden_init, batch_first)
59 output = f.new_zeros(batch_size, seq_size + 1, hidden_size)
60
—> 61 if hidden_init is not None: output[:, 0] = hidden_init
62 else: output.zero_()
63 else:

RuntimeError: The expanded size of the tensor (1150) must match the existing size (575) at non-singleton dimension 1. Target sizes: [64, 1150]. Tensor sizes: [64, 575]

This is the full trace. I installed the fastai developer package into my directory and calling the functions from there
Not sure why the tensor size is becoming half

0 Likes

#6

Oh there was a bug in the way the hidden size was comptued in the case of QRNN. Should be fixed now.

0 Likes

#7

Now there’s a different error.

RuntimeError Traceback (most recent call last)
in ()
----> 1 learn.lr_find()

~/punchh/sensus/notebooks/chetan/fastai/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, wd)
30 cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
31 epochs = int(np.ceil(num_it/len(learn.data.train_dl)))
—> 32 learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
33
34 def to_fp16(learn:Learner, loss_scale:float=None, max_noskip:int=1000, dynamic:bool=True, clip:float=None,

~/punchh/sensus/notebooks/chetan/fastai/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
198 callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
199 if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
–> 200 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
201
202 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/punchh/sensus/notebooks/chetan/fastai/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
99 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
100 xb, yb = cb_handler.on_batch_begin(xb, yb)
–> 101 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
102 if cb_handler.on_batch_end(loss): break
103

~/punchh/sensus/notebooks/chetan/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
24 if not is_listy(xb): xb = [xb]
25 if not is_listy(yb): yb = [yb]
—> 26 out = model(*xb)
27 out = cb_handler.on_loss_begin(out)
28

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input
94

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/awd_lstm.py in forward(self, input, from_embeddings)
115 new_hidden,raw_outputs,outputs = [],[],[]
116 for l, (rnn,hid_dp) in enumerate(zip(self.rnns, self.hidden_dps)):
–> 117 raw_output, new_h = rnn(raw_output, self.hidden[l])
118 new_hidden.append(new_h)
119 raw_outputs.append(raw_output)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/qrnn.py in forward(self, inp, hid)
156 if self.bidirectional: inp_bwd = inp.clone()
157 for i, layer in enumerate(self.layers):
–> 158 inp, h = layer(inp, None if hid is None else hid[2*i if self.bidirectional else i])
159 new_hid.append(h)
160 if self.bidirectional:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/qrnn.py in forward(self, inp, hid)
99
100 def forward(self, inp, hid=None):
–> 101 y = self.linear(self._get_source(inp))
102 if self.output_gate: z_gate,f_gate,o_gate = y.chunk(3, dim=2)
103 else: z_gate,f_gate = y.chunk(2, dim=2)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

~/punchh/sensus/notebooks/chetan/fastai/fastai/text/models/awd_lstm.py in forward(self, *args)
50 #To avoid the warning that comes because the weights aren’t flattened.
51 warnings.simplefilter(“ignore”)
—> 52 return self.module.forward(*args)
53
54 def reset(self):

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
90 @weak_script_method
91 def forward(self, input):
—> 92 return F.linear(input, self.weight, self.bias)
93
94 def extra_repr(self):

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1406 ret = torch.addmm(bias, input, weight.t())
1407 else:
-> 1408 output = input.matmul(weight.t())
1409 if bias is not None:
1410 output += bias

RuntimeError: size mismatch, m1: [4480 x 575], m2: [1150 x 1725] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:268

I know we shouldn’t use bidirectional for language model, but error with tensor mismatch is still weird. So is this an error on my side or in the code for qrnn?

0 Likes

#8

Eh, the bidir flag wan’t passed along to the QRNN, fixed that too.

0 Likes

#9

I think you should set the save_prev_x to False if bi_directional is enabled in class QRNN. Or else it throws an assertion error.
AssertionError: Can’t save the previous X with bidirectional

0 Likes

#10

It we could just strip trying to use That architecture with bidirectional :wink: it won’t work for the same reason save_prev_x can’t work with bidirectional, a language model can’t be bidirectional.

0 Likes

#11

Yeah makes sense.

And also in the QRNN class when calling QRNNLayer the variable save_prev_x is not passed, so in that case wouldn’t it always be False(default value)?
Maybe I’m missing something

0 Likes

#12

Looks like another mistake, thanks for flagging!

0 Likes

#13

Just wanted to know if the bugs in QRNN were fixed or not and if I can use qrnn to train a model?

0 Likes

#14

Oh, had forgotten about that. Just pushed a fix.

0 Likes

#15

Great, thanks!!

0 Likes

(Bobak Farzin) #16

Not sure if this is related. Having trouble with the forget_mult_cuda_forward with a QRNN language model and fp_16() Any idea on what I have setup wrong?

I pulled version 1.0.54.dev0 from github.

from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
data = TextLMDataBunch.from_csv(path, 'texts.csv')
config = awd_lstm_lm_config.copy()
config['qrnn'] = True
learn = language_model_learner(data, AWD_LSTM, drop_mult=0.5,
                               pretrained=False, config=config)
learn = learn.to_fp16(dynamic=True)
learn.fit_one_cycle(1)

error:

~/.conda/envs/fastaiv1_dev/lib/python3.7/site-packages/fastai/text/models/qrnn.py in forward(ctx, x, f, hidden_init, batch_first)
     30             if hidden_init is not None: output[0] = hidden_init
     31             else: output.zero_()
---> 32         output = forget_mult_cuda.forward(x, f, output, batch_first)
     33         ctx.save_for_backward(x, f, hidden_init, output)
     34         ctx.batch_first = batch_first

RuntimeError: "forget_mult_cuda_forward" not implemented for 'Half' (operator() at /home/bfarzin/.conda/envs/fastaiv1_dev/lib/python3.7/site-packages/fastai/text/models/forget_mult_cuda_kernel.cu:85)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7faa51af2dc5 in /home/bfarzin/.conda/envs/fastaiv1_dev/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x60cbb (0x7faa9f78fcbb in /tmp/torch_extensions/forget_mult_cuda/forget_mult_cuda.so)
0 Likes

#17

You can’t do QRNNs in half-precision because this hasn’t been implemented (as the error indicates quite clearly :wink: )

1 Like