How come resnet doesn't have fully connected layers

curious_sperm · July 14, 2019, 12:29am

So I’m creating cnn from scratch…where I understand full connected layer means…last layer of network has same size of kernel to that of input…

So jeremy said VGG has fully connected layers & it’s kind of slow + heavy

But resnet doesn’t have full connected layers. What does that mean…how to write something without fully connected layers? I thought last step is also mandatory

machinethink · July 14, 2019, 10:11am

The problem with a fully-connected layer is that it always expects its input to be a vector of a fixed size. But a convolution layer (or pooling layer) doesn’t care about the size of the input.

So if the entire network is made up of conv / pooling layers, you can more easily use it on images of different sizes. That’s a big reason for why almost no one uses FC layers anymore.

marcmuc · July 14, 2019, 6:02pm

Who says resnet doesn’t have fully connected layers? It has at least one, see here:

github.com

pytorch/vision/blob/master/torchvision/models/resnet.py#L151


self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                               dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                               dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                               dilate=replace_stride_with_dilation[2])
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)


for m in self.modules():
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)


# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros, and each residual block behaves like an identity.

if you use the fastai pretrained version with custom head it has 2.

VGG has 3 FC layers in the end but what makes it heavy&slow is that the middle one is huge (4096x4096) and the others have to lead up to and down from that, so there are millions of weights in those final layers. More modern architectures use much smaller FC layers and often only one or two (and yes there are also nets without FCs but resnet is not one of them)

github.com

pytorch/vision/blob/master/torchvision/models/vgg.py#L33


class VGG(nn.Module):


    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()


    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)

(examples from the torchvision implementations used in fastai)

machinethink · July 15, 2019, 9:17am

Note that in this case using a 1x1 conv layer is identical to using a fully-connected layer. So while this particular implementation of ResNet has one, others may use a 1x1 conv here.

(Key is the AdaptiveAvgPool2d layer that precedes it. This reduces the feature map to 1x1.)