Hi,
I’m trying to put some clues together here and will be really grateful to have your advices.
I’m recently re-investigating WeightDropout
for the two-parameter-related issues and the usage of initializing the dropped weight with the identity function of F.dropout(training=False)
. The latter one has been asked in a dangling thread (Using F.dropout to copy parameters), and I figure that may be for QRNN’s discussion here and then the revisions of https://github.com/n-waves/fastai/commit/d60adca369f6e548a494109a849ea5ebb1061a61 and https://github.com/fastai/fastai/commit/b842586e9b080ed83afb251d4236ec6843d823de.
For the former one of the duplicated weights, I’m wondering if we can trick it like the original Salesforce version, which deletes the original weight in __init__()
once and then put it back in forward()
, such that the gradient will be picked up correctly without having an extra weight layer. It may have something to do with the frequently changing behavior of Tensor.is_leaf
among different versions of PyTorch, according to the discussion I participate for AllenNLP’s DropConnect
: Add workarounds to avoid _flat_weights issues for DropConnect #issuecomment-546670214. Perhaps the F.dropout(training=False)
has something to do with initializing it with Tensor.is_leaf=True
for the optimizer to add parameter groups correctly.
Please check my revision for DropConnect
and let me know whether it is correct or not.
Thank you!