Running batch norm tweaks

jeremy · April 13, 2019, 4:15am

Lesson 10 Discussion & Wiki (2019)

or are you saying that replacing:
        self.sums.detach_()
        self.sqrs.detach_()
with:
        x = x.detach()
and not skipping any calculations has a detrimental impact on the outcome?

Yes that’s exactly what I saw. Accuracy with bs=32 went from 97%->90% IIRC.

It was my intent to have the next iteration of calcs be part of the graph - I did it this way because I was trying to avoid having a continuously growing history in the graph. Which I think is working AFAICT.

But it’s certainly true that I don’t fully understand the details of all this, and I’m sure there’s things that need to be fixed to make this work with the “occasional skipping”.