Question about vgg16' CNN architecture

iaming · November 8, 2017, 1:26pm

the output shapes of block1_conv1 and block1_conv2 are the same. (64, 224, 224).
does this mean at block1_conv2 there is only ONE filter???
because from what i heard at the lesson, one filter would create the same amount of ouput from the input.

vgg.summary():

Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 3, 224, 224) 0

block1_conv1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0]

block1_conv2 (Convolution2D) (None, 64, 224, 224) 36928 block1_conv1[0][0]

block1_pool (MaxPooling2D) (None, 64, 112, 112) 0 block1_conv2[0][0]

SakvaUA · November 8, 2017, 1:31pm

It has 64 filters. Each filter generates one feature.

iaming · November 8, 2017, 11:36pm

so only 1/64 of output of block1_conv1 will become the input of ONE filter of block1_conv2?
i.e. 1 filter will take care of only 1x244x244. ?

SakvaUA · November 9, 2017, 12:37pm

Each filter of the next block will take all channels from the previous block as input and output one feature (channel) as an output.

iaming · November 9, 2017, 1:12pm

so how does it do the all channels? Sum them up?
For 1 filter:
input: 64x244x244, ouput:1x244x244

From the lesson, i understand that the 3x3 filter will not change the frame size, for each overlapping 3x3 area of input it does a dot product like operation and get 1 sell in output.
But how does it tackle multiple channels? get the 1 sell for each channel then sum them up?

SakvaUA · November 9, 2017, 1:15pm

Yep. It’s a weighted sum of all channels. Each filter has HxWxD weights (plus biases) where HxW - are filter size (3,3) and D is depth which is equal to the number of input channels.

iaming · November 12, 2017, 4:42pm

If H * W * D, then in one filter for different channels the 3x3 could be completely different.