Implementing center loss, which has its own parameters to optimize

ducha-aiki · December 17, 2018, 9:23am

Hi,

I am trying to implement center loss in fast.ai, but struggling with format.
https://github.com/KaiyangZhou/pytorch-center-loss ( Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016.)

PyTorch implementation loos like this:


class CenterLoss(nn.Module):
    """Center loss.
    
    Reference:
    Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016.
    
    Args:
        num_classes (int): number of classes.
        feat_dim (int): feature dimension.
    """
    def __init__(self, num_classes=10, feat_dim=2, use_gpu=True):
        super(CenterLoss, self).__init__()
        self.num_classes = num_classes
        self.feat_dim = feat_dim
        self.use_gpu = use_gpu

        if self.use_gpu:
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim).cuda())
        else:
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim))

    def forward(self, x, labels):
        """
        Args:
            x: feature matrix with shape (batch_size, feat_dim).
            labels: ground truth labels with shape (batch_size).
        """
        batch_size = x.size(0)
        distmat = torch.pow(x, 2).sum(dim=1, keepdim=True).expand(batch_size, self.num_classes) + \
                  torch.pow(self.centers, 2).sum(dim=1, keepdim=True).expand(self.num_classes, batch_size).t()
        distmat.addmm_(1, -2, x, self.centers.t())

        classes = torch.arange(self.num_classes).long()
        if self.use_gpu: classes = classes.cuda()
        labels = labels.unsqueeze(1).expand(batch_size, self.num_classes)
        mask = labels.eq(classes.expand(batch_size, self.num_classes))

        dist = []
        for i in range(batch_size):
            value = distmat[i][mask[i]]
            value = value.clamp(min=1e-12, max=1e+12) # for numerical stability
            dist.append(value)
        dist = torch.cat(dist)
        loss = dist.mean()

        return loss

I cannot implement it as just function, because it has its own parameters to optimize and added to loss. I also cannot just use it as final layer, because I need softmax and usual output as well. Could anyone please point out where to look for a clue?

sgugger · December 17, 2018, 1:45pm

If it has parameters to optimize, it’ll need to be part of your model (since the optimizer takes that for the training loop). You can return two things as an output if you want the loss and regular output, then have a callback only keep the first one for the training loop and store the second (like the RNNTrainer).

ducha-aiki · December 17, 2018, 2:42pm

Thanks a lot! Is there any general overview, how to work with multihead networks and custom regulatizations (e.g. add to loss something, which depends on model, not on model output)?
Upd.: seems like https://docs.fast.ai/callback.html#callback has everything I need.