Fastbook/Chapter-13 Convolutions

arora_aman · July 16, 2020, 8:52am

In chapter-13, under section 1.2.1, we create a simple_cnn like so:

simple_cnn = sequential(
    conv(1 ,4),            #14x14
    conv(4 ,8),            #7x7
    conv(8 ,16),           #4x4
    conv(16,32),           #2x2
    conv(32,2, act=False), #1x1
    Flatten(),
)

The complete training loop looks like:

path = untar_data(URLs.MNIST_SAMPLE)
mnist = DataBlock((ImageBlock(cls=PILImageBW), CategoryBlock), 
                  get_items=get_image_files, 
                  splitter=GrandparentSplitter(),
                  get_y=parent_label)

dls = mnist.dataloaders(path)
learn = Learner(dls, simple_cnn, loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.01)

conv is defined as:

def conv(ni, nf, ks=3, act=True):
    res = nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)
    if act: res = nn.Sequential(res, nn.ReLU())
    return res

Here’s a question:
In the simple_cnn why are we going from 32 channels to 2 channels in the final conv layer?
In other words why is this layer conv(32,1, act=False), #1x1 going from 32 channels to 2 channel and not 32 channels to 64 channels?

Don’t we usually double the number of channels when we have stride-2 convolution? Is there an emperical or theoretical reason as to why the final layer is going from 32 to 2 channel please?

(Update)
In case anybody else is wondering this it’s because the final output then becomes something like (64,2) assuming we have batch size of 64. This is perfect because the final results can be interpreted as probabilities of the 2 classes 3, 7 which are part of mnist dataloaders.