Changes in architecture using learn.set_data

ravijain · February 21, 2018, 7:28am

Hello,
I was following the Dog Breed identification task from Lesson 2
In the video jeremy introduces us to a insight in which we first train our model with smaller size images(sz=224) and then use learn.set_data to train model with larger size images(sz=299).
I was wondering about the changes in architecture it forces.
I understand that convolution operations doesn’t require fixed size images as input. But the output size is dependent on the input size. So at the end after all the convolution layers, when flatten layer is applied, the output feature size should be different for sz=224 and sz=299.
But the output feature size from Flatten layer is fixed 4096 features as shown by learn.summary()

(8): AdaptiveConcatPool2d(
(ap): AdaptiveAvgPool2d(output_size=(1, 1))
(mp): AdaptiveMaxPool2d(output_size=(1, 1))
)
(9): Flatten(
)
(10): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True)
(11): Dropout(p=0.25)
(12): Linear(in_features=4096, out_features=512, bias=True)
(13): ReLU()
(14): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True)
(15): Dropout(p=0.5)
(16): Linear(in_features=512, out_features=120, bias=True)
(17): LogSoftmax()

What am i missing here?
Is the Adaptive ConcatPool2d at work here? If yes, Doesn’t the layer meaning changing from sz=224 to sz=299? What exactly Adaptive ConcatPool2d do?
Also, jeremy pointed that training with smaller images first and then with larger images is helpful because it prevents overfitting. How?

Regards,
Ravi

radek · February 21, 2018, 9:41am

set_data only points the learner to new data, it doesn’t make any changes to the architecture.

The way this is achieved is via the pooling operation. Depending on the input size the feature maps produced by the last layer of the base model will have different sizes, but their count will be the same. If we take a max or an average over a 3x3 feature map, or a 5x5, in all of those cases we get a single number at the end. If we have 10 of these maps, we get 10 results for max and 10 results for average and we can concatenate them - have a vector of length 20.

This is from layers.py:

4   class AdaptiveConcatPool2d(nn.Module):                                                                                                                                                                         
  1     def __init__(self, sz=None):                                                                                                                                                                               
  2         super().__init__()                                                                                                                                                                                     
  3         sz = sz or (1,1)                                                                                                                                                                                       
  4         self.ap = nn.AdaptiveAvgPool2d(sz)                                                                                                                                                                     
  5         self.mp = nn.AdaptiveMaxPool2d(sz)                                                                                                                                                                     
  6     def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)