Does Darknet class implement shortcuts of darknet?

Jwtgtc · January 11, 2019, 1:45am

Darknet53 uses shortcut layers. I am going through the lectures again and the darknet class here.

fastai/fastai/blob/master/fastai/vision/models/darknet.py#L22




class ResLayer(nn.Module):
"Resnet style layer with `ni` inputs."
def __init__(self, ni:int):
    super().__init__()
    self.conv1=conv_bn_lrelu(ni, ni//2, ks=1)
    self.conv2=conv_bn_lrelu(ni//2, ni, ks=3)


def forward(self, x): return x + self.conv2(self.conv1(x))


class Darknet(nn.Module):
"https://github.com/pjreddie/darknet"
def make_group_layer(self, ch_in:int, num_blocks:int, stride:int=1):
    "starts with conv layer - `ch_in` channels in - then has `num_blocks` `ResLayer`"
    return [conv_bn_lrelu(ch_in, ch_in*2,stride=stride)
           ] + [(ResLayer(ch_in*2)) for i in range(num_blocks)]


def __init__(self, num_blocks:Collection[int], num_classes:int, nf=32):
    "create darknet with `nf` and `num_blocks` layers"
    super().__init__()
    layers = [conv_bn_lrelu(3, nf, ks=3, stride=1)]

I understand how the blocks are used to build the resnet layers, and how the layers are concatenated together to make the model. The number and architecture of the convolution make sense. But I can’t seem to under stand how the shortcut connections are implemented like I the the darknet53 config here https://github.com/pjreddie/darknet/blob/master/cfg/darknet53.cfg

I don’t understand how the the fastai implementation concates the output of the previous layer with the layer 3 above for the shortcut layer. Anyone able to provide insight?

bjack913 · January 11, 2019, 1:59am

Look carefully at this method in ResLayer:

def forward(self, x): return x + self.conv2(self.conv1(x))

We’re taking the inputs (x), and concatenating them with the outputs 2 layers above. In the darknet53.cfg file it says ‘-3’ rather than ‘-2’ because they’re counting backwards from ‘shortcut’ as its own layer. At least, that’s my interpretation

Jwtgtc · January 11, 2019, 1:53pm

Thank you, that makes sense. I completely missed that! The -2 vs -3 i was Ok with but i missed where the addition was being done. Another thing i think is interesting if I understand correctly is that in a block the shortcut layers actually point to the shortcut layers above it on the same block (not the first short be the others) . After writing our the layers on paper this is also being handled. Some very compact code compared to other implementations. Thank you so much for the reply.