Image Regression using fastai

mossCoder · March 24, 2019, 4:55pm

Thanks for the advice. I have made some progress with regard to custom head modifications.

I simply added a sigmoid to the existing head. I first determined what layers were contained in the head by:

learn.model[1]

So my head for a densenet121 model with sigmoid added looks like:

head = nn.Sequential(
    AdaptiveConcatPool2d(),
    Flatten(),
    nn.BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.Dropout(p=0.25),
    nn.Linear(in_features=2048, out_features=512, bias=True),
    nn.ReLU(inplace=True),
    nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.Dropout(p=0.5),
    nn.Linear(in_features=512, out_features=1, bias=True),
    nn.Sigmoid()
)

The model trains much more quickly with this addition and no longer predicts values below zero, though the performance after training isn’t much better.

I did find a kaggle example where someone implemented the scaled sigmoid in the forward : https://www.kaggle.com/rasmus01610/nih-chest-x-ray-age

From their notebook:

class AgeModel(nn.Module):
    def __init__(self):
        super().__init__()
        layers = list(models.resnet34().children())[:-2]
        layers += [AdaptiveConcatPool2d(), Flatten()]
        layers += [nn.Linear(1024,16), nn.ReLU(), nn.Linear(16,1)]
        self.agemodel = nn.Sequential(*layers)
    def forward(self, x):
        x = self.agemodel(x).squeeze()
        return torch.sigmoid(x) * (max_age - min_age) + min_age

I suspect that

layers = list(models.resnet34().children())[:-2]

Removes the last two layers of the head of the resnet, and then they add a few more of their own. I imagine that the additions are specific to their application, so I tried to reproduce without removing the last two layers, simply copy-pasting the densenet
layers and modifying the forward as follows:

class CoverModel(nn.Module):
    def __init__(self):
        super().__init__()
        layers = list(models.densenet121().children())
        self.covermodel = nn.Sequential(*layers)
    def forward(self, x):
        x = self.covermodel(x).squeeze()
        return torch.sigmoid(x) * (max_cover - min_cover) + min_cover

Setting the arch to CoverModel() throws an error, however, so still some work to be done. Thanks for your help.

mossCoder · April 16, 2019, 5:51pm

I’ve come upon an alternative way to add layers to the head which is more efficient than in my last post. To append a model head with a single layer you can simply use the add_module function from pytorch as follows:

learn = cnn_learner(data,
                    models.densenet121, 
                    metrics=explained_variance)

learn.model[1].add_module("sigmoid", module=nn.Sigmoid())

mossCoder · April 19, 2019, 7:08pm

Think I finally got it. The approach I took was to make a custom pytorch module as follows:

ymin = 0
ymax = 100

class scaledSigmoid(nn.Module):
    def forward(self, input):
        return torch.sigmoid(input) * (ymax - ymin) + ymin

Putting it together with my last post:

learn = cnn_learner(data,
                    models.densenet121, 
                    metrics=explained_variance)

learn.model[1].add_module("sSig", module= scaledSigmoid())

After talking to a colleague, however, he suggested a modified ReLU, since a sigmoid isn’t ideal for predicting at the the extrema. So this is the module that works best for me with my response data that are scaled 0 to 100:

ymin = 0
ymax = 100

class clampedReLU(nn.Module):
    def forward(self, input):
        bottomClamp = input < ymin
        topClamp = input > ymax
        input[bottomClamp,] = ymin
        input[topClamp,] = ymax
        return input

learn = cnn_learner(data,
                    models.densenet121, 
                    metrics=explained_variance)

learn.model[1].add_module("cReLU", module= clampedReLU())

Appears to behave as expected when added as a final layer. I’m no longer predicting above 100 or below 0.

Pomo · April 19, 2019, 9:25pm

I think your forward code does exactly what torch.clamp() does, except that the latter runs a gazillion times faster.

mossCoder · April 20, 2019, 3:54am

Great, thanks for sharing that.

It’s interesting that Jeremy advocates for scaling with a sigmoid. Is my understanding correct that models with a final sigmoid layer struggle to predict the extrema? Why choose sigmoid over a clamp approach?

rob2 · April 23, 2019, 11:51am

Hi,

I am also working on an image regression problem using convnets in fastai.
The goal is to learn a know homography matrix (3x3 matrix) from a given black/white image (which is currently stored as RGB). The input to the network is an image and the outputs are the nine different entries in the homography matrix.
The problem is that the entries in the matrix might be very different:

I followed the common approto create the dataset:

data = (ImageList.from_folder(path)
                .split_by_rand_pct(0.1)
                .label_from_func(get_y, label_cls = FloatList)
                .databunch(bs=8)
                .normalize(do_y=True))

Here is the head that is created by default by fastai

     (1): Sequential(
        (0): AdaptiveConcatPool2d(
          (ap): AdaptiveAvgPool2d(output_size=1)
          (mp): AdaptiveMaxPool2d(output_size=1)
        )
        (1): Flatten()
        (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (3): Dropout(p=0.25)
        (4): Linear(in_features=1024, out_features=512, bias=True)
        (5): ReLU(inplace)
        (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): Dropout(p=0.5)
        (8): Linear(in_features=512, out_features=9, bias=True)
      )

But the training does not show a lot of progress.
Training progress

I expect that this is due to the very high numbers of each output.
Do you have some tipps and tricks how I could get this to work?

Thanks a lot!
Robert

mossCoder · April 23, 2019, 10:53pm

Perhaps you could z-scale your response data? Subtract mean and then divide by standard deviation within each response column?

mossCoder · April 23, 2019, 10:55pm

Or log transform if there are outliers within a column.