Running batch norm tweaks

t-v · April 13, 2019, 7:39am

To make this compare more precise for those reading along: what happens is that PyTorch has freed the graph of the previous batch in that batch’s backward and now finds that it has a node (from the previous batch) without the graph connections it would need to propagate the gradient.

It’ll all become more clear once Jeremy decides to do “even more impractical deep learning for coders” where he also re-implements autograd.