Why using torch.no_grad() when initializing embedding weights?

Following is the snippet from the code
(refer:https://github.com/fastai/fastai/blob/master/fastai/layers.py, Line 290)

def embedding(ni:int,nf:int) -> nn.Module:
“Create an embedding layer.”
emb = nn.Embedding(ni, nf)
# See https://arxiv.org/abs/1711.09160
with torch.no_grad(): trunc_normal_(emb.weight, std=0.01)
return emb

Now if we test for, emb.weight.requires_grad(), it gives TRUE. What was the purpose for torch.no_grad(), if the gradient calculation is still active?

Hi,

This is also done in the official nn.init module (see this per example), my guess is that we only need the gradient with respect to the parameters of the layers, and not the parameters used to initialize them, like in your example, without torch.no_grad(), I think we’ll end up also getting the gradient of the loss with respect to mean and std used in the initialization.

1 Like