I am trying to train a network with an input of size bs*c*40*20 and output of size bs*c*20*20. The input is two grayscale images concatenated. And the loss is MSE between the first image (first part of the concatenated input) and the output.
This should be an easy thing to learn as all it requires is just outputting whatever is in the first part of the input.
However, I am not having luck with training it. I have tried different architectures including fully connected and convolutional ones. As optimizers: Adam and RMSprop.
I suspect the problem might be in the way I am making the output smaller than the input.
Here are some of the models I tried:
class lin_block(nn.Module):
def __init__(self, in_features, out_features):
super().__init__()
self.m = nn.Sequential(nn.Linear(in_features, out_features),
nn.BatchNorm1d(1),
nn.ReLU(True)) # also tried leakyrelu
def forward(self, x):
return self.m(x)
class SrResnet(nn.Module):
def __init__(self):
super().__init__()
# also tried with dropout
self.lin = nn.Sequential(lin_block(800, 700),
# nn.Dropout(0.5),
lin_block(700, 600),
# nn.Dropout(0.5),
lin_block(600, 500),
# nn.Dropout(0.5),
lin_block(500, 400))
def forward(self, x):
# my data loader gave x as a touple of images (bs*1*20*20, bs*1*20*20)
x = torch.cat((x[0], x[1]), 2)
x = x.view(x.size(0), 1, -1)
# x is now flat (bs*1*800), so I can use linear layers
x = self.lin(x)
# also tried with sigmoid(x) and tanh(x)
return x.view((-1, 1, 20, 20))
I ran the learning rate finder and with a suitable learning rate I trained it but it got stuck predicting one of:
All pixels same color
The pixels look like random pixels and do not change with different inputs
Looking somewhat like the actually wanted input but not getting better even with hours of training
To fit I used:
learn.fit(0.1, 3, cycle_len=1, cycle_mult=2, wds=1e-6) #also with 0.01 as lr.
For one of the convolutional models I used:
def conv(ni, nf, kernel_size=3, actn=False):
layers = [nn.Conv2d(ni, nf, kernel_size, padding=kernel_size//2)]
if actn:
layers.append(nn.LeakyReLU(inplace=True))
return nn.Sequential(*layers)
class ResSequential(nn.Module):
def __init__(self, layers, res_scale=1.0):
super().__init__()
self.res_scale = res_scale
self.m = nn.Sequential(*layers)
def forward(self, x): return x + self.m(x) * self.res_scale
def res_block(nf):
return ResSequential(
[conv(nf, nf, actn=True), conv(nf, nf)],
0.1)
def conv_block(nf):
return nn.Sequential(conv(nf, nf, actn=True), conv(nf, nf, actn=True))
class SrResnet(nn.Module):
def __init__(self, c, layers):
super().__init__()
features = [conv(1, c)]
for i in range(layers):
features.append(res_block(c)) # also tried with conv_block(c)
features += [conv(c, c),
nn.BatchNorm2d(c),
nn.AdaptiveAvgPool2d(20),
conv(c, 1)]
self.features = nn.Sequential(*features)
def forward(self, x):
x = torch.cat((x[0], x[1]), 2)
# x is now (bs*1*40*20)
return self.features(x)
In this case, I was getting some pretty interesting results after training; the network had perfected the first 3 rows. Meaning that it was exactly the same as the first 3 rows of the first input image. But the other pixel rows were just random looking pixels.
After training for longer I got up to 5 rows with correct outputs. But training any longer didn’t help. And that was training awfully long for a simple problem like this.
Any ideas on what I might have been doing wrong? I checked the input and it was good. The shapes were good so no problem there too. The fit worked as expected so the only problem was it not learning at all or just extremely slowly. I think the problem must be with my model. Any help would be appreciated!