Please ignore most of it. My apologies for the distraction and the somewhat off-topic reply.
I just realize that as long as we pass non-frozen parameters to the optimizer, there’s simply no need to worry about duplicated weights and non-leaf tensors. (Probably because the duplicated weights shared the same value in __init__()
such that Module._named_members()
returns only one of them)
A slightly unclear thing is that F.dropout(training=False)
in __init__()
seems won’t have any effect now (perhaps it was to keep is_leaf=True
for old versions of PyTorch?), unless it does some magic that QRNN requires. In other words, I regret that I didn’t post this under another seemly more relevant thread of Using F.dropout to copy parameters.
Again, sorry for the hassle.