CNN architecture for mnist

In this tutorial from Jeremy: What is torch.nn really? he has an example towards the end where he creates a CNN for mnist. In nn.Conv2d he makes the inchannels and outchannels: (1,16), (16,16), (16,10). I get that the last one has to be 10 because there are 10 classes and we want ‘probabilities’ of each class. But why go up to 16 first? How do you choose this value? And why not just go from 1 to 10, 10 to 10, and 10 to 10? Does this have to do with the kernel_size and stride?

All of the images are 28x28 so I can’t see any correlation between these values and 16 either.

class Mnist_CNN(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)

    def forward(self, xb):
        xb = xb.view(-1, 1, 28, 28)
        xb = F.relu(self.conv1(xb))
        xb = F.relu(self.conv2(xb))
        xb = F.relu(self.conv3(xb))
        xb = F.avg_pool2d(xb, 4)
        return xb.view(-1, xb.size(1))

If you go up to 16 channels the model capacity increases and the model should be able to fit the data better.

Have a look at other architectures like ResNet, there you will see that the channels usually increase with depth.

However, just try to train one network with 16 and another with 10 and see what the difference is (loss in the end, behavior over training, etc.). This is a nice small experiment.

1 Like