# Why using torch.no_grad() when initializing embedding weights?

Following is the snippet from the code
(refer:https://github.com/fastai/fastai/blob/master/fastai/layers.py, Line 290)

def embedding(ni:int,nf:int) -> nn.Module:
“Create an embedding layer.”
emb = nn.Embedding(ni, nf)
# See https://arxiv.org/abs/1711.09160
This is also done in the official `nn.init` module (see this per example), my guess is that we only need the gradient with respect to the parameters of the layers, and not the parameters used to initialize them, like in your example, without `torch.no_grad()`, I think we’ll end up also getting the gradient of the loss with respect to `mean` and `std` used in the initialization.