L1 cost penalty for specific layer

I am a bit of a newbie with fastai and pytorch, so my apologies if this is a silly question.

I am trying to replicate some results from a paper and in this paper, they are adding an L1 penalty for a specific layer in the model. So say I have a model that has a couple of linear layers and then a couple of convolutional layers and I would like for my loss function to put an extra L1 penalty on the output of a hidden layer in addition to MSELoss on the output, so say my network is something like this (this is a dummy example, to show the general idea):

class MyNetwork(nn.Module):
    def __init__(self, samples_in, matrix_out):
        super(MyNetwork, self).__init__()
        self.samples_in = samples_in
        self.matrix_out = matrix_out
        self.samples_out = np.prod(self.matrix_out)
        self.fc1 = nn.Sequential(nn.Linear(self.samples_in*2, self.samples_out), nn.Tanh())
        self.cnv1 = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=64, kernel_size=5, stride=1, padding=2), nn.ReLU())
        self.dcnv = nn.Sequential(nn.ConvTranspose2d(in_channels=64, out_channels=1, kernel_size=7, stride=1, padding=3))

    def forward(self, x):
        batch_size = x.shape[0]
        x = self.fc1(x)
        x = x.reshape((batch_size,1) + self.matrix_out)
        x = self.cnv1(x)

        # Calculating L1 norm of the output of convolutional layer:
        l1_term = torch.mean(torch.abs(x))
        x = self.dcnv(x)
        x = x.reshape((batch_size, self.samples_out))

        return x

I would like the cost function to be something like:

def mycost(pred, target):
     cost = ((pred-target)**2).mean() + 0.001*l1_term
     return cost

So in order to do that I could change the forward function to return the l1_term as well, so something like:

def forward(self, x):
   #... other stuff in forward

   return x, l1_term

and then have cost function:

def mycost(pred, target):
     cost = ((pred[0]-target)**2).mean() + 0.001*pred[1]
     return cost

Which actually means to work in the sense that the training runs, but when I go to do a prediction with the model afterwards, I get an error. It is probably predictable that there would be some issues, but it is pretty opaque to me as a newbie what is happening, here is the error:

So I am thinking that I am probably trying to shoehorn this in the wrong way that I am wondering if there is a typical pattern one can/should use with fastai. Any help/guidance/commentary would be much appreciated.


Maybe I can have a go at answering this myself :wink: One possible pattern is to register a forward hook on that convolution layer and in that hook, grab the output of the layer, calculate the L1 norm and store it in a variable, which we can then add in when calculating the cost. So something like this:

def myfwdhook(module,input_,output):
     global l1_term
     l1_term = output.abs().mean()

def mycustomloss(pred,target):
     return ((pred-target)**2).mean() + 0.0001*l1_term

class MyNetwork(nn.Module):
    def __init__(self, samples_in, matrix_out):
        super(MyNetwork, self).__init__()
        self.samples_in = samples_in
        self.matrix_out = matrix_out
        self.samples_out = np.prod(self.matrix_out)
        self.fc1 = nn.Sequential(nn.Linear(self.samples_in*2, self.samples_out), nn.Tanh())
        self.cnv1 = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=64, kernel_size=5, stride=1, padding=2), nn.ReLU())


        self.dcnv = nn.Sequential(nn.ConvTranspose2d(in_channels=64, out_channels=1, kernel_size=7, stride=1, padding=3))

    def forward(self, x):
        batch_size = x.shape[0]
        x = self.fc1(x)
        x = x.reshape((batch_size,1) + self.matrix_out)
        x = self.cnv1(x)
        x = self.dcnv(x)
        x = x.reshape((batch_size, self.samples_out))

        return x

Or something like that.

In my continued conversation with myself, I can maybe try another approach, which I think is the fastai way to do this with a Learner callback. So I could do something like:

class L1RegCallback(Callback):
    def __init__(self, reglambda = 0.0001):
        self.reglambda = reglambda
    def before_backward(self):
        regularization_loss = 0.0
        for param in self.learn.model.cnv2.parameters():
            regularization_loss += torch.mean(torch.abs(param))
        self.learn.loss += self.reglambda*regularization_loss

And then something like:

learn = Learner(dls, mynetwork, opt_func=RMSProp, loss_func = nn.MSELoss(reduction='mean'), metrics=nn.MSELoss(reduction='mean'), cbs=[L1RegCallback()])

And it would add the L1 loss to the loss (before backprojection).

@sachinruk and @sgugger you have had some other threads on this:

But I think some of that was based on previous version of fastai. So I was wondering what your thoughts are, is the approach above correct for adding an L1 loss (for a specific layer).

Hey, just to quickly answer your question, I found better (faster) convergence when you updated the weights directly instead of adding it to the loss as you seem to be doing here.

The only thing I’d change in my answer is that it ought to be param.data = param.data - learning_rate * self.beta * sign.

