Hello and great to meet you everybody,
I have been trying to write a custom implementation for DenseNet by adding a few layers after the final Dense Block. As suggested by Andrej Karpathy, I am trying to test the changes by overfitting the network on a small number of training examples (32 images).
Whilst doing this I came across a strange behaviour: if I include 2 nn.BatchNorm1d layers in the top of my network it becomes much harder to overfit on the 32 pictures.
The code for the network:
class BatchNormTest(nn.Module): def __init__(self, top_dense_features=512, top_drop_rate=0, num_classes=1000): super().__init__() # Top joined_num_features = 2352 self.top = nn.Sequential(OrderedDict([ ('top_dense0', nn.Linear(joined_num_features, top_dense_features)), ('top_norm0', nn.BatchNorm1d(top_dense_features)), ('top_relu0', nn.ReLU(inplace=True)), ('top_dropout0', nn.Dropout(p=top_drop_rate, inplace=True)), ('top_dense1', nn.Linear(top_dense_features, top_dense_features)), ('top_norm1', nn.BatchNorm1d(top_dense_features)), ('top_relu1', nn.ReLU(inplace=True)), ('top_dropout1', nn.Dropout(p=top_drop_rate, inplace=True)), ('top_output', nn.Linear(top_dense_features, num_classes)), ])) def forward(self, x): x = x.view(x.shape, -1) x_out = self.top(x) return x_out
model = BatchNormTest(data.c) learn = Learner(data, model, wd=0) learn.fit(50, lr=1e-2)
- without nn.BatchNorm1d for 50 epochs: train loss 0.683759
- with nn.BatchNorm1d for 50 epochs: train loss 3.11638
Is this behavior expected from nn.BatchNorm1d?