Fixed! Why doesn't my model learn this simple task?

I am trying to train a network with an input of size bs*c*40*20 and output of size bs*c*20*20. The input is two grayscale images concatenated. And the loss is MSE between the first image (first part of the concatenated input) and the output.

This should be an easy thing to learn as all it requires is just outputting whatever is in the first part of the input.

However, I am not having luck with training it. I have tried different architectures including fully connected and convolutional ones. As optimizers: Adam and RMSprop.

I suspect the problem might be in the way I am making the output smaller than the input.

Here are some of the models I tried:

class lin_block(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()

        self.m = nn.Sequential(nn.Linear(in_features, out_features),
                               nn.BatchNorm1d(1),
                               nn.ReLU(True)) # also tried leakyrelu

    def forward(self, x):
        return self.m(x)

class SrResnet(nn.Module):
    def __init__(self):
        super().__init__()

        # also tried with dropout
        self.lin = nn.Sequential(lin_block(800, 700),
                                 # nn.Dropout(0.5),
                                 lin_block(700, 600),
                                 # nn.Dropout(0.5),
                                 lin_block(600, 500),
                                 # nn.Dropout(0.5),
                                 lin_block(500, 400))

    def forward(self, x):
        # my data loader gave x as a touple of images (bs*1*20*20, bs*1*20*20)
        x = torch.cat((x[0], x[1]), 2)
        
        x = x.view(x.size(0), 1, -1)
        # x is now flat (bs*1*800), so I can use linear layers
        x = self.lin(x)

        # also tried with sigmoid(x) and tanh(x)
        return x.view((-1, 1, 20, 20))

I ran the learning rate finder and with a suitable learning rate I trained it but it got stuck predicting one of:
All pixels same color
The pixels look like random pixels and do not change with different inputs
Looking somewhat like the actually wanted input but not getting better even with hours of training

To fit I used:

learn.fit(0.1, 3, cycle_len=1, cycle_mult=2, wds=1e-6) #also with 0.01 as lr.

For one of the convolutional models I used:

def conv(ni, nf, kernel_size=3, actn=False):
    layers = [nn.Conv2d(ni, nf, kernel_size, padding=kernel_size//2)]
    if actn:
        layers.append(nn.LeakyReLU(inplace=True))
    return nn.Sequential(*layers)

class ResSequential(nn.Module):
    def __init__(self, layers, res_scale=1.0):
        super().__init__()
        self.res_scale = res_scale
        self.m = nn.Sequential(*layers)

    def forward(self, x): return x + self.m(x) * self.res_scale

def res_block(nf):
    return ResSequential(
        [conv(nf, nf, actn=True), conv(nf, nf)],
        0.1)

def conv_block(nf):
    return nn.Sequential(conv(nf, nf, actn=True), conv(nf, nf, actn=True))

class SrResnet(nn.Module):
    def __init__(self, c, layers):
        super().__init__()

        features = [conv(1, c)]

        for i in range(layers):
            features.append(res_block(c)) # also tried with conv_block(c)

        features += [conv(c, c),
                     nn.BatchNorm2d(c),
                     nn.AdaptiveAvgPool2d(20),
                     conv(c, 1)]

        self.features = nn.Sequential(*features)

    def forward(self, x):
        x = torch.cat((x[0], x[1]), 2)
        # x is now (bs*1*40*20)
        return self.features(x)

In this case, I was getting some pretty interesting results after training; the network had perfected the first 3 rows. Meaning that it was exactly the same as the first 3 rows of the first input image. But the other pixel rows were just random looking pixels.

After training for longer I got up to 5 rows with correct outputs. But training any longer didn’t help. And that was training awfully long for a simple problem like this.

Any ideas on what I might have been doing wrong? I checked the input and it was good. The shapes were good so no problem there too. The fit worked as expected so the only problem was it not learning at all or just extremely slowly. I think the problem must be with my model. Any help would be appreciated! :slight_smile:

1 Like

I have simplified it to the maximum and it works like a charm.

class lin_block(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.m = nn.Sequential(nn.Linear(in_features, out_features))

    def forward(self, x):
        return self.m(x)

class SrResnet(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Sequential(lin_block(800, 400))

    def forward(self, x):
        x = torch.cat((x[0], x[1]), 2)
        x = x.view(x.size(0), 1, -1)
        x = self.lin(x)
        return x.view((-1, 1, 20, 20))

Why doesn’t the more complicated work?

Adding a relu activation makes it worse by a long shot!

class lin_block(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.m = nn.Sequential(nn.Linear(in_features, out_features),
                                             nn.ReLU(True))

LeakyReLU is good on the other hand… I will try the bigger network with it.

Adding the batch norm didn’t make it worse so I tried with two linear layers:
lin_block(800, 600)
lin_block(600, 400)

Unfortunately, this has caused the training to converge into local minima and was unable to escape it.

Images showing first half of x and full y:

with two lin_blocks - loss: 0.14
image
with one lin_block - loss: 0.0001
image

Why might this be? What should I do to fix it?

After getting rid of the batch norm and trying with the same two lin_blocks as before, I was able to get down to a loss of 0.036
image
This took 4 times as long and got barely better after that.
I don’t think that should happen…

1 Like