RNNLearner `ValueError: can't optimize a non-leaf Tensor`

steveyang · November 6, 2018, 8:57pm

I apologize in advance for the length of this post…

In an attempt to use RNNLearner with a custom model architecture, I’m first running through using it with RNNCore without using the language_model class method abstraction. When trying to fit the data I get the following exception: ValueError: can't optimize a non-leaf Tensor.

Here’s a step-through of what I’m doing:

I’m using the IMDB databunch as is

path = untar_data(URLs.IMDB_SAMPLE)
data_lm = TextLMDataBunch.from_csv(path)

Creating the model graph just as in RNNLearner.language_model()

# Create a full AWD-LSTM.
rnn_enc = RNNCore(vocab_sz=vocab_sz,
                  emb_sz=emb_sz,
                  n_hid=n_hid,
                  n_layers=n_layers,
                  pad_token=pad_token,
                  qrnn=qrnn,
                  bidir=bidir,
                  hidden_p=hidden_p,
                  input_p=input_p,
                  embed_p=embed_p,
                  weight_p=weight_p)

enc = rnn_enc.encoder if tie_weights else None
model = SequentialRNN(rnn_enc, LinearDecoder(vocab_sz, emb_sz, output_p, tie_encoder=enc, bias=bias))

Then trying to fit once

learn = RNNLearner(data_lm, model)
learn.fit(1)

In that last chunk of code, I run into ValueError: can't optimize a non-leaf Tensor. I’ve tried diagnosing and the tensors that seem to be problematic are the ones created here in RNNCore()

self.rnns = [nn.LSTM(emb_sz if l == 0 else n_hid, (n_hid if l != n_layers - 1 else emb_sz)//self.ndir,
                1, bidirectional=bidir) for l in range(n_layers)]
self.rnns = [WeightDropout(rnn, weight_p) for rnn in self.rnns]

I’m a bit lost on what’s happening here, my guess was it overwrites the LSTMs with the LSTMs w/ Dropouts, and later only forward pass through the LSTMs w/ Dropouts?

This issue seems kind of related here

I’ve also tried manually fitting with straight PyTorch like this:

# this is a hack for now... not sure if this messes up the graph somewhere by doing this
opt_params = [par for par in model.parameters() if par.is_leaf]
optimizer = torch.optim.Adam(opt_params, lr=learning_rate)

y_pred = model(x)

for t in range(5):
    y_pred = model(x)[0]
    
    loss = loss_fn(y_pred, y)
    print(t, loss.item())
    
    optimizer.zero_grad()
    
    loss.backward()
    
    optimizer.step()

This only works when I ignore “non-leaf” tensors, but I’m not sure if I’m breaking something by ignoring them.

I guess my ultimate question is

Am I understanding correctly what’s happening with the LSTMs and LSTM Dropouts
Is it correct to ignore the non-leaf tensors like I’m doing manually?
If it’s ok to ignore the non-leaf, is there a way to do something like that through the Learner() framework?

sgugger · November 6, 2018, 9:22pm

I’m guessing you’re missing an initial reset(), if everything else is similar to the get_language_model method.

steveyang · November 6, 2018, 9:56pm

That seems to have worked! Now I just feel silly that’s what it was…

Thanks so much @sgugger for having an answer for everything I’ve asked on the forum!

sgugger · November 6, 2018, 9:59pm

It just helps properly initialize those RNN modules.
You’re very welcome, glad I can help!

hxyike · January 21, 2019, 2:49pm

Hi, I followed the tutorial about text model on fastai V1. But instead of fine tuning the WT103_1 model, I’d like to create a new model. But I got the error: ‘can not optimize a non-leaf Tensor’ as follows:
path = untar_data(URLs.IMDB_SAMPLE)
data_lm = TextLMDataBunch.from_csv(path)
learn = language_model_learner(data_lm, drop_mult=0.5)
learn.fit(1)

Sorry to bother you, but I searched a lot and did not know the reason.

hxyike · January 21, 2019, 3:36pm

The bug is fixed after updating the fastai version from 1.0.39 -> 1.0.40, .