I am running into some behavior I can’t explain, and was hoping for some help. I am using fastai 1.0.49 and pytorch 1.0.1 on linux.
I created a notebook gist to explain my issue here: gist
To start, let me define a simple model that reproduces the issue.
class WTTest(nn.Module):
def __init__(self,num_classes):
super().__init__()
self.base = create_body(models.alexnet)
for p in self.base.parameters(): p.requires_grad = False
self.head = create_head(256*2,num_classes)
def forward(self, x):
return self.head(self.base(x))
learn = Learner(data,WTTest(data.c))
The issue is this. If I get some data x and call learn.model.base(x) to get the activations of the base network, train the network for any length using learn.fit_one_cycle(1,1e-3), and then call learn.model.base(x) again, the two sets of activations are different despite the fact that the base layer is frozen. However, if I compare the base layer weights before and after training, they are identical. I can’t figure out how the activations are changing despite the fact the the input and weights stay the same.
The activations from before and after training don’t train all that much (usually on the order of 0.01 difference), but the more epochs I train the more they differ.
I originally thought it was an issue with batch norm statistics, but Alexnet has no batchnorm layers.
I had been (though not in the gist); I just added calls to learn.model.eval() in both locations and the result is the same – the activations change slightly.
I am on GPU. For example, after training one epoch a single feature map in the output goes from:
tensor([[0.6377, 0.0000, 0.0000],
[1.7095, 1.3126, 0.0102],
[0.0000, 0.0000, 0.0000]], device=‘cuda:0’)
To
tensor([[0.6377, 0.0000, 0.0000],
[1.7083, 1.3123, 0.0083],
[0.0000, 0.0000, 0.0000]],
Edit: and after training for 10 epochs, the resulting output becomes:
[[0.6370, 0.0000, 0.0000],
[1.7006, 1.3142, 0.0000],
[0.0000, 0.0000, 0.0000]],
I’ve done some more tests, and this only seems to occur when using Alexnet or vgg16/19 as the base network. I tried resnets and densenets, and it doesn’t happen in those cases.