Summary information wrong with custom head

austinmw · February 25, 2019, 4:15pm

I’m passing in an alexnet model from the pretrainedmodels repository to create_cnn, using the default alexnet head rather than adaptive pooling.

Does anyone know why the summary information is incorrect although the model information is correct and the model seems to train properly? I’m having a bit of trouble following the how the summary code works here. (using the dogs/cats breed dataset that has 37 classes). For example:

import pretrainedmodels

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=64
                                  ).normalize(imagenet_stats)

arch = pretrainedmodels.__dict__['alexnet'](num_classes=1000,pretrained='imagenet')
arch.last_linear.out_features = data.c

model = nn.Sequential(OrderedDict([
    ('features', arch._features),
    ('classifier', nn.Sequential(Flatten(), *children(arch)[1:]))
]))

learn = create_cnn(data, lambda *args : model, metrics=error_rate, custom_head=net.classifier)

# model says 37 classes in last linear
print(learn.model)
# summary incorrectly says 1000 classes in last linear
print(learn.summary())

Output:

Sequential(
  (0): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
      (1): ReLU(inplace)
      (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
      (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
      (4): ReLU(inplace)
      (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
      (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): ReLU(inplace)
      (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): ReLU(inplace)
      (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (11): ReLU(inplace)
      (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
  )
  (1): Sequential(
    (0): Flatten()
    (1): Dropout(p=0.5)
    (2): Linear(in_features=9216, out_features=4096, bias=True)
    (3): ReLU(inplace)
    (4): Dropout(p=0.5)
    (5): Linear(in_features=4096, out_features=4096, bias=True)
    (6): ReLU(inplace)
    (7): Linear(in_features=4096, out_features=37, bias=True)
  )
)
======================================================================
Layer (type)         Output Shape         Param #    Trainable 
======================================================================
Conv2d               [1, 64, 55, 55]      23,296     False     
______________________________________________________________________
ReLU                 [1, 64, 55, 55]      0          False     
______________________________________________________________________
MaxPool2d            [1, 64, 27, 27]      0          False     
______________________________________________________________________
Conv2d               [1, 192, 27, 27]     307,392    False     
______________________________________________________________________
ReLU                 [1, 192, 27, 27]     0          False     
______________________________________________________________________
MaxPool2d            [1, 192, 13, 13]     0          False     
______________________________________________________________________
Conv2d               [1, 384, 13, 13]     663,936    False     
______________________________________________________________________
ReLU                 [1, 384, 13, 13]     0          False     
______________________________________________________________________
Conv2d               [1, 256, 13, 13]     884,992    False     
______________________________________________________________________
ReLU                 [1, 256, 13, 13]     0          False     
______________________________________________________________________
Conv2d               [1, 256, 13, 13]     590,080    False     
______________________________________________________________________
ReLU                 [1, 256, 13, 13]     0          False     
______________________________________________________________________
MaxPool2d            [1, 256, 6, 6]       0          False     
______________________________________________________________________
Flatten              [1, 9216]            0          False     
______________________________________________________________________
Dropout              [1, 9216]            0          False     
______________________________________________________________________
Linear               [1, 4096]            37,752,832 True      
______________________________________________________________________
ReLU                 [1, 4096]            0          False     
______________________________________________________________________
Dropout              [1, 4096]            0          False     
______________________________________________________________________
Linear               [1, 4096]            16,781,312 True      
______________________________________________________________________
ReLU                 [1, 4096]            0          False     
______________________________________________________________________
Linear               [1, 1000]            4,097,000  True      
______________________________________________________________________

Total params: 61,100,840
Total trainable params: 58,631,144
Total non-trainable params: 2,469,696

austinmw · February 25, 2019, 9:56pm

I think I figured this out. I needed to completely replace the last linear layer instead. Doing arch.last_linear.out_features = data.c apparently does not actually work, even though learn.model makes it appear that it’s correct.

Tomas1337 · April 26, 2020, 2:09pm

Could you share your solution on how you replaced the last linear layer?