I apologize in advance for the length of this post…
In an attempt to use
RNNLearner with a custom model architecture, I’m first running through using it with
RNNCore without using the
language_model class method abstraction. When trying to fit the data I get the following exception:
ValueError: can't optimize a non-leaf Tensor.
Here’s a step-through of what I’m doing:
I’m using the IMDB databunch as is
path = untar_data(URLs.IMDB_SAMPLE) data_lm = TextLMDataBunch.from_csv(path)
Creating the model graph just as in
# Create a full AWD-LSTM. rnn_enc = RNNCore(vocab_sz=vocab_sz, emb_sz=emb_sz, n_hid=n_hid, n_layers=n_layers, pad_token=pad_token, qrnn=qrnn, bidir=bidir, hidden_p=hidden_p, input_p=input_p, embed_p=embed_p, weight_p=weight_p) enc = rnn_enc.encoder if tie_weights else None model = SequentialRNN(rnn_enc, LinearDecoder(vocab_sz, emb_sz, output_p, tie_encoder=enc, bias=bias))
Then trying to fit once
learn = RNNLearner(data_lm, model) learn.fit(1)
In that last chunk of code, I run into
ValueError: can't optimize a non-leaf Tensor. I’ve tried diagnosing and the tensors that seem to be problematic are the ones created here in
self.rnns = [nn.LSTM(emb_sz if l == 0 else n_hid, (n_hid if l != n_layers - 1 else emb_sz)//self.ndir, 1, bidirectional=bidir) for l in range(n_layers)] self.rnns = [WeightDropout(rnn, weight_p) for rnn in self.rnns]
I’m a bit lost on what’s happening here, my guess was it overwrites the LSTMs with the LSTMs w/ Dropouts, and later only forward pass through the LSTMs w/ Dropouts?
This issue seems kind of related here
I’ve also tried manually fitting with straight PyTorch like this:
# this is a hack for now... not sure if this messes up the graph somewhere by doing this opt_params = [par for par in model.parameters() if par.is_leaf] optimizer = torch.optim.Adam(opt_params, lr=learning_rate) y_pred = model(x) for t in range(5): y_pred = model(x) loss = loss_fn(y_pred, y) print(t, loss.item()) optimizer.zero_grad() loss.backward() optimizer.step()
This only works when I ignore “non-leaf” tensors, but I’m not sure if I’m breaking something by ignoring them.
I guess my ultimate question is
- Am I understanding correctly what’s happening with the LSTMs and LSTM Dropouts
- Is it correct to ignore the non-leaf tensors like I’m doing manually?
- If it’s ok to ignore the non-leaf, is there a way to do something like that through the