I am a little confused in the calculation of number of input to the flattening layer. Consider following example from Part 1 (v1) example (its same as this tutorial):
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(**16 * 5 * 5,** 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
why do we specify here the input to fc1 as 1655 ?
What i understand is that the above layer is outputting 16 matrices which are activations so we can just stack them into the flattening layer, why multiply the number again by the weight matrix (5x5) ?
Second question : in conv2 layer we are specifying number of input channels as 6 and output as 16 , i take it as input of 6 matrices and 16 output matrices, we are not specifying anything regarding their size or dimensions… So pytorch dynamically creates tensors for incoming tensors/matrices?