Multilingual ULMFiT

Please ignore most of it. My apologies for the distraction and the somewhat off-topic reply.

I just realize that as long as we pass non-frozen parameters to the optimizer, there’s simply no need to worry about duplicated weights and non-leaf tensors. (Probably because the duplicated weights shared the same value in __init__() such that Module._named_members() returns only one of them)

A slightly unclear thing is that F.dropout(training=False) in __init__() seems won’t have any effect now (perhaps it was to keep is_leaf=True for old versions of PyTorch?), unless it does some magic that QRNN requires. In other words, I regret that I didn’t post this under another seemly more relevant thread of Using F.dropout to copy parameters.

Again, sorry for the hassle.