Unexpected number of channels in DynamicUnet model

ababino · October 7, 2020, 10:22pm

Hi!
I was using the DynamicUnet model and I noticed that the number of channels in the decoder’s layers were not powers of two. For example, if you run the following code:

from fastai.vision.all import *
m = resnet34()
m = nn.Sequential(*list(m.children())[:-2])
tst = DynamicUnet(m, 3, (128,128), norm_type=None)
sizes = model_sizes(tst, (128, 128))
sizes

Your will see that sizes is:

[torch.Size([1, 512, 4, 4]),
 torch.Size([1, 512, 4, 4]),
 torch.Size([1, 512, 4, 4]),
 torch.Size([1, 512, 4, 4]),
 torch.Size([1, 512, 8, 8]),
 torch.Size([1, 384, 16, 16]),
 torch.Size([1, 256, 32, 32]),
 torch.Size([1, 96, 64, 64]),
 torch.Size([1, 96, 128, 128]),
 torch.Size([1, 96, 128, 128]),
 torch.Size([1, 99, 128, 128]),
 torch.Size([1, 99, 128, 128]),
 torch.Size([1, 3, 128, 128])]

The layers with 384, 96, and 99 channels caught my attention. I was expecting powers of two.
This is due to the implementation of UnetBlock.

Is this a feature or a bug? or neither? What do you think?

Thanks!!

ababino · October 8, 2020, 4:34am

To add some context, if I change two lines in the definition of UnetBlock I get the “standard” values for the number of channels. Like so:

class UnetBlock(Module):
    "A quasi-UNet block, using `PixelShuffle_ICNR upsampling`."
    @delegates(ConvLayer.__init__)
    def __init__(self, up_in_c, x_in_c, hook, final_div=True, blur=False, act_cls=defaults.activation,
                 self_attention=False, init=nn.init.kaiming_normal_, norm_type=None, **kwargs):
        self.hook = hook
        #self.shuf = PixelShuffle_ICNR(up_in_c, up_in_c//2, blur=blur, act_cls=act_cls, norm_type=norm_type)
        self.shuf = PixelShuffle_ICNR(up_in_c, x_in_c, blur=blur, act_cls=act_cls, norm_type=norm_type)
        self.bn = BatchNorm(x_in_c)
        #ni = up_in_c//2 + x_in_c
        ni = 2 * x_in_c
        nf = ni if final_div else ni//2
        self.conv1 = ConvLayer(ni, nf, act_cls=act_cls, norm_type=norm_type, **kwargs)
        self.conv2 = ConvLayer(nf, nf, act_cls=act_cls, norm_type=norm_type,
                               xtra=SelfAttention(nf) if self_attention else None, **kwargs)
        self.relu = act_cls()
        apply_init(nn.Sequential(self.conv1, self.conv2), init)

    def forward(self, up_in):
        s = self.hook.stored
        up_out = self.shuf(up_in)
        ssh = s.shape[-2:]
        if ssh != up_out.shape[-2:]:
            up_out = F.interpolate(up_out, s.shape[-2:], mode='nearest')
        cat_x = self.relu(torch.cat([up_out, self.bn(s)], dim=1))
        return self.conv2(self.conv1(cat_x))

This is what I would expect from an UNet. I mean, the number of channels in a decoder’s layer to be two times the number of channels in the encoder’s layer that it’s skip-connected to.