Implementing center loss, which has its own parameters to optimize


I am trying to implement center loss in, but struggling with format. ( Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016.)

PyTorch implementation loos like this:

class CenterLoss(nn.Module):
    """Center loss.
    Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016.
        num_classes (int): number of classes.
        feat_dim (int): feature dimension.
    def __init__(self, num_classes=10, feat_dim=2, use_gpu=True):
        super(CenterLoss, self).__init__()
        self.num_classes = num_classes
        self.feat_dim = feat_dim
        self.use_gpu = use_gpu

        if self.use_gpu:
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim).cuda())
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim))

    def forward(self, x, labels):
            x: feature matrix with shape (batch_size, feat_dim).
            labels: ground truth labels with shape (batch_size).
        batch_size = x.size(0)
        distmat = torch.pow(x, 2).sum(dim=1, keepdim=True).expand(batch_size, self.num_classes) + \
                  torch.pow(self.centers, 2).sum(dim=1, keepdim=True).expand(self.num_classes, batch_size).t()
        distmat.addmm_(1, -2, x, self.centers.t())

        classes = torch.arange(self.num_classes).long()
        if self.use_gpu: classes = classes.cuda()
        labels = labels.unsqueeze(1).expand(batch_size, self.num_classes)
        mask = labels.eq(classes.expand(batch_size, self.num_classes))

        dist = []
        for i in range(batch_size):
            value = distmat[i][mask[i]]
            value = value.clamp(min=1e-12, max=1e+12) # for numerical stability
        dist =
        loss = dist.mean()

        return loss

I cannot implement it as just function, because it has its own parameters to optimize and added to loss. I also cannot just use it as final layer, because I need softmax and usual output as well. Could anyone please point out where to look for a clue?

If it has parameters to optimize, it’ll need to be part of your model (since the optimizer takes that for the training loop). You can return two things as an output if you want the loss and regular output, then have a callback only keep the first one for the training loop and store the second (like the RNNTrainer).


Thanks a lot! Is there any general overview, how to work with multihead networks and custom regulatizations (e.g. add to loss something, which depends on model, not on model output)?
Upd.: seems like has everything I need.