PyTorch internal error while doing the imdb notebook

t-v · May 22, 2018, 7:42am

Hi,

I’m working through the imdb notebook in Lesson10, I get a PyTorch error
At the first finetuning step under Language model (learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)) I get an error

RuntimeError: range.second - range.first == t.size() ASSERT FAILED at torch/csrc/autograd/generated/Functions.cpp:47, please report a bug to PyTorch. inconsistent range for TensorList output

while that is clear advice and I’ll try to drill to the root cause, has anyone else seen this as well and already solved this?
(I’m using PyTorch/master, so some breakage is OK.)

Best regards

Thomas

urmas.pitsi · May 22, 2018, 8:33am

I have pythorch 0.3.1 and it works fine. I guess you are trying to run it with the latest pytorch version? I think fast.ai can have some breakages above pytorch 0.3.1. I see recently there are commits to be 0.4 compatible though.

t-v · May 22, 2018, 8:55am

Thanks for the hint! I’m relatively attached to running PyTorch / master.
I have dug a bit further: Apparently the cudnn rnn backwards behaves strange with which grads are enabled and which are not.
So if anyone else runs into this: setting torch.backends.cudnn.enabled = False gets you around this.
Apparently there still is a bug in PyTorch to be fixed, though.

t-v · May 22, 2018, 9:09am

Sure enough @sgugger found it first:

sgugger · May 22, 2018, 11:28am

Yes, someone else mentioned this error on the lesson wiki so I was trying to fix it, but it turns out the problem isn’t on our side
It seems you have found a fix though, congrats!

t-v · May 22, 2018, 12:58pm

He, I guess I’m on the PyTorch side, too… Thanks for all your work! I’m just checking out how you proceeded with French in the language Zoo to do this for German when this blew up on me.

sgugger · May 22, 2018, 1:18pm

I’ve learned a lot more on training an LM and making super-convergence work on them, so I definitely have to share a notebook on what I found worked best.

t-v · May 22, 2018, 5:36pm

I look forward to that! My plan was to use sentencepiece as I hope to benefit from the subwords with German compound nouns - I did have lots of UNK in some previous experiments and it also seems to work well for OpenNMT.

In the meantime: Did I miss something about more compatibility issues?
I’m now at learner.lr_find(start_lr=lrs/10, end_lr=lrs*10, linear=True) and get

TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'weight_hh_l0' (torch.nn.Parameter or None expected)

I think I should know how to fix it if noone else did yet.

sgugger · May 22, 2018, 10:18pm

Weird, this ran fine for me (pytorch 0.4.0).

t-v · May 23, 2018, 9:13am

I’m pretty sure the nn.Module code in PyTorch wants to keep you from overwriting a parameter with a variable - as the regularizer does (there is a PyTorch issue where I have opinions on how to deal with “calculated parameters” which would be a neat solution here, too) - but it might depend on other factors like the python version whether that works. Interestingly, the code has been there for a long time.

I made it work (but not without also some _raw vs. not so _raw hacking in load_module). I’m not sure that I want to impose it on you if the problem doesn’t exist for anyone else.

ericroland · May 25, 2018, 8:36pm

I am getting the same error:

TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'weight_hh_l0' (torch.nn.Parameter or None expected)

How did you get around it? Also, running PyTorch 0.4.0.

Thank you!

t-v · May 25, 2018, 8:41pm

I submitted my fix as a PR and a bit more description here:

Out of curiosity: What python version are you on?

Best regards

Thomas

ericroland · May 25, 2018, 8:43pm

3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0]

Running on SageMaker. Your PR fixes my issue.

I appreciate the help.

t-v · May 26, 2018, 5:28am

Hi Eric,

Cool! Thank you for reporting back.

Best regards

Thomas

todd · May 28, 2018, 1:19am

Thomas, your pull request patch also fixed the problem for me.

I am running Python 3.6.5 and Pytorch 0.3.1, and my fastai repo was current as of today.

Running an 1080ti here, locally on my Linux Mint machine.

Thanks much.

prajjwal1 · June 14, 2018, 12:06pm

I’m still getting the error, inconsistent range of Tensor input. Has it been fixed ? I’m on Pytorch master (0.4) . Plus I think there is some sort of memory leak happening. The code doesn’t run but whole RAM is occupied.

t-v · June 14, 2018, 12:30pm

PyTorch master is what you get when you check out the git repo and recompile. 0.4 does have the bug, it’ll be fixed in 0.5.

Best regards

Thomas

prajjwal1 · June 14, 2018, 1:08pm

For now, I am not freezing the last embedding layer (due to error) and have to train the entire model in one go.

knesgood · August 30, 2018, 7:00pm

Running into the same problem. Is there a big performance hit when not freezing the last layer?

t-v · August 30, 2018, 8:12pm

Which PyTorch version are you on? 0.4.1 should have the fix needed…