Symptom:
Accuracy between epochs while training remains almost the same (instead of steadily increasing before stabilizing to > 96%). Accuracy never reaches beyond 51% even after training for 20 epochs.
TLDR; If you don’t use double parantheses in the init_params(())
call given the way it is defined, the parameters don’t get tracked correctly by requires_grad_().
Debugging:
-
Since accuracy was not increasing, the model wasn’t learning. Learning doesn’t happen if calculation of loss and gradients doesn’t happen correctly.
-
I checked the loss after each training batch within a single epoch and it wasn’t steadily decreasing. This meant either the mnist_loss() function wasn’t getting calculated right or the gradients weren’t being updated correctly.
-
I verified that the predictions and loss value was getting correctly computed. This must mean the gradients weren’t right. Upon checking the calc_grad() function, sure enough the gradients were off.
-
Used the trick shown in the Jeremy’s notebook - I called calc_grad() twice successively and the gradients weren’t added up as was claimed in the notebook. This meant either gradient calculation was incorrect (unlikely since its a fairly stable library implementation) or I messed up in marking the right variables for gradient calculation().
-
I checked requires_grad_() and it was appropriately called in init_params().
-
I next checked the call to init_params() (after a ton of other debugging and several hours going through everything again and again). Found the mistake -> need to call init_params( ( ) ) with double parantheses but I was using single paranthesis. This was messing up how the parameters need to be set up for requires_grad_() (check details below).
Details:
Below is a fairly innocuous looking function def
def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()
And the call happens like so,
weights = init_params((28*28, 1))
-> double paranthesis (a Python feature)
The thing to notice here is that in the call above, double parantheses need to be used. Read the below link for the difference between double and single paranthesis in a function call in Python:
Without the right way of calling init_params()
, the requires_grad_()
function call within init_params doesn’t keep track of the parameters correctly (I don’t know why or how). But calling it with double parantheses fixes the issue.