I was looking at the code for AWD LSTMs (shown below) and was confused about the use of F.dropout
with training=False
.
WEIGHT_HH = 'weight_hh_l0'
class WeightDropout(nn.Module):
def __init__(self, module, weight_p=[0.], layer_names=[WEIGHT_HH]):
super().__init__()
self.module,self.weight_p,self.layer_names = module,weight_p,layer_names
for layer in self.layer_names:
#Makes a copy of the weights of the selected layers.
w = getattr(self.module, layer)
self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))
self.module._parameters[layer] = F.dropout(w, p=self.weight_p, training=False)
def _setweights(self):
for layer in self.layer_names:
raw_w = getattr(self, f'{layer}_raw')
self.module._parameters[layer] = F.dropout(raw_w, p=self.weight_p, training=self.training)
I asked the question on github and the answer was that it was to initially copy the weights to its ‘_raw’ version.
So my questions are:
- Why use F.dropout with training=False. Can’t you simply do
self.module._parameters[layer] = w.clone()
. - Do you even need to copy the weights across like this because
getattr(self.module, layer)
is exactly the same asself.module._parameters[layer]
. Rendering the last line in__init__
unnecessary? - What is the purpose of having a
weight_hh_l0_raw
version? Is it because the lineself.module._parameters[layer] = F.dropout(raw_w, p=self.weight_p, training=self.training)
overwrites the weights and you want to preserve the weights?