I was following the Dog Breed identification task from Lesson 2
In the video jeremy introduces us to a insight in which we first train our model with smaller size images(sz=224) and then use learn.set_data to train model with larger size images(sz=299).
I was wondering about the changes in architecture it forces.
I understand that convolution operations doesn’t require fixed size images as input. But the output size is dependent on the input size. So at the end after all the convolution layers, when flatten layer is applied, the output feature size should be different for sz=224 and sz=299.
But the output feature size from Flatten layer is fixed 4096 features as shown by learn.summary()
(ap): AdaptiveAvgPool2d(output_size=(1, 1))
(mp): AdaptiveMaxPool2d(output_size=(1, 1))
(10): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True)
(12): Linear(in_features=4096, out_features=512, bias=True)
(14): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True)
(16): Linear(in_features=512, out_features=120, bias=True)
What am i missing here?
Is the Adaptive ConcatPool2d at work here? If yes, Doesn’t the layer meaning changing from sz=224 to sz=299? What exactly Adaptive ConcatPool2d do?
Also, jeremy pointed that training with smaller images first and then with larger images is helpful because it prevents overfitting. How?