In chapter-13, under section 1.2.1, we create a simple_cnn
like so:
simple_cnn = sequential(
conv(1 ,4), #14x14
conv(4 ,8), #7x7
conv(8 ,16), #4x4
conv(16,32), #2x2
conv(32,2, act=False), #1x1
Flatten(),
)
The complete training loop looks like:
path = untar_data(URLs.MNIST_SAMPLE)
mnist = DataBlock((ImageBlock(cls=PILImageBW), CategoryBlock),
get_items=get_image_files,
splitter=GrandparentSplitter(),
get_y=parent_label)
dls = mnist.dataloaders(path)
learn = Learner(dls, simple_cnn, loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.01)
conv
is defined as:
def conv(ni, nf, ks=3, act=True):
res = nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)
if act: res = nn.Sequential(res, nn.ReLU())
return res
Here’s a question:
In the simple_cnn
why are we going from 32 channels to 2 channels in the final conv
layer?
In other words why is this layer conv(32,1, act=False), #1x1
going from 32 channels to 2 channel and not 32 channels to 64 channels?
Don’t we usually double the number of channels when we have stride-2 convolution? Is there an emperical or theoretical reason as to why the final layer is going from 32 to 2 channel please?
(Update)
In case anybody else is wondering this it’s because the final output then becomes something like (64,2) assuming we have batch size of 64. This is perfect because the final results can be interpreted as probabilities of the 2 classes 3, 7 which are part of mnist dataloaders.