RNNLearner `ValueError: can't optimize a non-leaf Tensor`

I apologize in advance for the length of this post…

In an attempt to use RNNLearner with a custom model architecture, I’m first running through using it with RNNCore without using the language_model class method abstraction. When trying to fit the data I get the following exception: ValueError: can't optimize a non-leaf Tensor.

Here’s a step-through of what I’m doing:

I’m using the IMDB databunch as is

path = untar_data(URLs.IMDB_SAMPLE)
data_lm = TextLMDataBunch.from_csv(path)

Creating the model graph just as in RNNLearner.language_model()

# Create a full AWD-LSTM.
rnn_enc = RNNCore(vocab_sz=vocab_sz,

enc = rnn_enc.encoder if tie_weights else None
model = SequentialRNN(rnn_enc, LinearDecoder(vocab_sz, emb_sz, output_p, tie_encoder=enc, bias=bias))

Then trying to fit once

learn = RNNLearner(data_lm, model)

In that last chunk of code, I run into ValueError: can't optimize a non-leaf Tensor. I’ve tried diagnosing and the tensors that seem to be problematic are the ones created here in RNNCore()

self.rnns = [nn.LSTM(emb_sz if l == 0 else n_hid, (n_hid if l != n_layers - 1 else emb_sz)//self.ndir,
                1, bidirectional=bidir) for l in range(n_layers)]
self.rnns = [WeightDropout(rnn, weight_p) for rnn in self.rnns]

I’m a bit lost on what’s happening here, my guess was it overwrites the LSTMs with the LSTMs w/ Dropouts, and later only forward pass through the LSTMs w/ Dropouts?

This issue seems kind of related here

I’ve also tried manually fitting with straight PyTorch like this:

# this is a hack for now... not sure if this messes up the graph somewhere by doing this
opt_params = [par for par in model.parameters() if par.is_leaf]
optimizer = torch.optim.Adam(opt_params, lr=learning_rate)

y_pred = model(x)

for t in range(5):
    y_pred = model(x)[0]
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

This only works when I ignore “non-leaf” tensors, but I’m not sure if I’m breaking something by ignoring them.

I guess my ultimate question is

  1. Am I understanding correctly what’s happening with the LSTMs and LSTM Dropouts
  2. Is it correct to ignore the non-leaf tensors like I’m doing manually?
  3. If it’s ok to ignore the non-leaf, is there a way to do something like that through the Learner() framework?

I’m guessing you’re missing an initial reset(), if everything else is similar to the get_language_model method.

That seems to have worked! Now I just feel silly that’s what it was… :sweat_smile:

Thanks so much @sgugger for having an answer for everything I’ve asked on the forum!

It just helps properly initialize those RNN modules.
You’re very welcome, glad I can help!

Hi, I followed the tutorial about text model on fastai V1. But instead of fine tuning the WT103_1 model, I’d like to create a new model. But I got the error: ‘can not optimize a non-leaf Tensor’ as follows:
path = untar_data(URLs.IMDB_SAMPLE)
data_lm = TextLMDataBunch.from_csv(path)
learn = language_model_learner(data_lm, drop_mult=0.5)

Sorry to bother you, but I searched a lot and did not know the reason.

The bug is fixed after updating the fastai version from 1.0.39 -> 1.0.40, .

1 Like