I apologize in advance for the length of this post…
In an attempt to use RNNLearner
with a custom model architecture, I’m first running through using it with RNNCore
without using the language_model
class method abstraction. When trying to fit the data I get the following exception: ValueError: can't optimize a non-leaf Tensor
.
Here’s a step-through of what I’m doing:
I’m using the IMDB databunch as is
path = untar_data(URLs.IMDB_SAMPLE)
data_lm = TextLMDataBunch.from_csv(path)
Creating the model graph just as in RNNLearner.language_model()
# Create a full AWD-LSTM.
rnn_enc = RNNCore(vocab_sz=vocab_sz,
emb_sz=emb_sz,
n_hid=n_hid,
n_layers=n_layers,
pad_token=pad_token,
qrnn=qrnn,
bidir=bidir,
hidden_p=hidden_p,
input_p=input_p,
embed_p=embed_p,
weight_p=weight_p)
enc = rnn_enc.encoder if tie_weights else None
model = SequentialRNN(rnn_enc, LinearDecoder(vocab_sz, emb_sz, output_p, tie_encoder=enc, bias=bias))
Then trying to fit once
learn = RNNLearner(data_lm, model)
learn.fit(1)
In that last chunk of code, I run into ValueError: can't optimize a non-leaf Tensor
. I’ve tried diagnosing and the tensors that seem to be problematic are the ones created here in RNNCore()
self.rnns = [nn.LSTM(emb_sz if l == 0 else n_hid, (n_hid if l != n_layers - 1 else emb_sz)//self.ndir,
1, bidirectional=bidir) for l in range(n_layers)]
self.rnns = [WeightDropout(rnn, weight_p) for rnn in self.rnns]
I’m a bit lost on what’s happening here, my guess was it overwrites the LSTMs with the LSTMs w/ Dropouts, and later only forward pass through the LSTMs w/ Dropouts?
This issue seems kind of related here
I’ve also tried manually fitting with straight PyTorch like this:
# this is a hack for now... not sure if this messes up the graph somewhere by doing this
opt_params = [par for par in model.parameters() if par.is_leaf]
optimizer = torch.optim.Adam(opt_params, lr=learning_rate)
y_pred = model(x)
for t in range(5):
y_pred = model(x)[0]
loss = loss_fn(y_pred, y)
print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
This only works when I ignore “non-leaf” tensors, but I’m not sure if I’m breaking something by ignoring them.
I guess my ultimate question is
- Am I understanding correctly what’s happening with the LSTMs and LSTM Dropouts
- Is it correct to ignore the non-leaf tensors like I’m doing manually?
- If it’s ok to ignore the non-leaf, is there a way to do something like that through the
Learner()
framework?