Question about vgg16' CNN architecture


the output shapes of block1_conv1 and block1_conv2 are the same. (64, 224, 224).
does this mean at block1_conv2 there is only ONE filter???
because from what i heard at the lesson, one filter would create the same amount of ouput from the input.


Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 3, 224, 224) 0

block1_conv1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0]

block1_conv2 (Convolution2D) (None, 64, 224, 224) 36928 block1_conv1[0][0]

block1_pool (MaxPooling2D) (None, 64, 112, 112) 0 block1_conv2[0][0]

(Dennis Sakva) #2

It has 64 filters. Each filter generates one feature.


so only 1/64 of output of block1_conv1 will become the input of ONE filter of block1_conv2?
i.e. 1 filter will take care of only 1x244x244. ?

(Dennis Sakva) #4

Each filter of the next block will take all channels from the previous block as input and output one feature (channel) as an output.


so how does it do the all channels? Sum them up?
For 1 filter:
input: 64x244x244, ouput:1x244x244

From the lesson, i understand that the 3x3 filter will not change the frame size, for each overlapping 3x3 area of input it does a dot product like operation and get 1 sell in output.
But how does it tackle multiple channels? get the 1 sell for each channel then sum them up?

(Dennis Sakva) #6

Yep. It’s a weighted sum of all channels. Each filter has HxWxD weights (plus biases) where HxW - are filter size (3,3) and D is depth which is equal to the number of input channels.


If H * W * D, then in one filter for different channels the 3x3 could be completely different.