Calculating number of input_features to flattening layer

I am a little confused in the calculation of number of input to the flattening layer. Consider following example from Part 1 (v1) example (its same as this tutorial):

class Net(nn.Module):

def __init__(self):
    super(Net, self).__init__()
    # 1 input image channel, 6 output channels, 5x5 square convolution kernel
    self.conv1 = nn.Conv2d(1, 6, 5)
    self.conv2 = nn.Conv2d(6, 16, 5)

    self.fc1 = nn.Linear(**16 * 5 * 5,** 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

why do we specify here the input to fc1 as 1655 ?

What i understand is that the above layer is outputting 16 matrices which are activations so we can just stack them into the flattening layer, why multiply the number again by the weight matrix (5x5) ?

Second question : in conv2 layer we are specifying number of input channels as 6 and output as 16 , i take it as input of 6 matrices and 16 output matrices, we are not specifying anything regarding their size or dimensions… So pytorch dynamically creates tensors for incoming tensors/matrices?

found the answer, the reason is forward function has 2 maxpools so output before first fc is 16x5x5.

the outputs of individual layers can visualized via torchsummary, same as model summary of keras.