Why are we using the raw weights saved during the beginning of the training but not the updated weights after we forward pass through every batch?

With the line 32, the raw weights are registered as a parameter of the network. They are thus updated during the optimizer step.

Ok, thanks. :slight_smile:

Weight dropout randomly drops individual weights in the weight matrices at each training step. Intuitively, this is dropping connections between layers, forcing the network to adapt to a different connectivity at each training step.