BatchNorm in lesson10, Jeremy used
as below for defining
means, the explanation for which I could understand fine.
However, his explanation made me wonder why we do NOT have to use
register_bufer as well for defining
self.eps, and I would like help to understand why.
At 1:43:12 of the lecture video for Lesson10,
If we move the model to the GPU anything registered as buffer will be moved to GPU as well
If we didn’t do that, then it tries to do the calculation down here, and vars and means are not on the GPU, but everything else is on the GPU, we get an error.
If this is the case, do we not have to define
eps using buffer as well, since it is also involved
in the calculation inside
self.eps is defined as
self.eps = eps, it will NOT automatically moved on GPU when the model is moved to GPU, if I understand correctly.
Then, we should get an error while
x = (x-m) / (v+self.eps).sqrt() is executed
since we are trying to use a thing (=
x) on the GPU with a thing(=
eps) NOT on the GPU.
class BatchNorm(nn.Module): def __init__(self, nf, mom=0.1, eps=1e-5): super().__init__() # NB: pytorch bn mom is opposite of what you'd expect self.mom,self.eps = mom,eps self.mults = nn.Parameter(torch.ones (nf,1,1)) self.adds = nn.Parameter(torch.zeros(nf,1,1)) self.register_buffer('vars', torch.ones(1,nf,1,1)) self.register_buffer('means', torch.zeros(1,nf,1,1)) def update_stats(self, x): m = x.mean((0,2,3), keepdim=True) v = x.var ((0,2,3), keepdim=True) self.means.lerp_(m, self.mom) self.vars.lerp_ (v, self.mom) return m,v def forward(self, x): if self.training: with torch.no_grad(): m,v = self.update_stats(x) else: m,v = self.means,self.vars x = (x-m) / (v+self.eps).sqrt() return x*self.mults + self.adds