Can someone help me with this error

user_432 · July 23, 2020, 7:18am

I’m trying to calculate the pixel-wise custom loss for my Segmentation model. I have designed the training loop as follows:

def training(self, epoch):
        train_loss = 0.0
        self.model.train()
        tbar = tqdm(self.trainloader)
        for k, (image, target) in enumerate(tbar):
            self.scheduler(self.optimizer, k, epoch, self.best_pred)
            self.optimizer.zero_grad()
            image, target = image.cuda(), target.cuda()
            features = self.model(image)
            b,c,h,w = features.size()
            for i in range(h):
                for j in range(w):
                    f = features[:,:,i,j]
                    f = f.unsqueeze(2)          #[b,c,1]
                    t = target[:,i,j]           #[b]
                    loss = self.criterion(f, t)
                    loss.sum().backward()
                    self.optimizer.step()
            train_loss += loss.item()
            tbar.set_description('Train loss: %.3f' % (train_loss / (k + 1)))

And I’ve been getting this error:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

Can someone help me with this. I’m pretty new to PyTorch. Thanks:)

PalaashAgrawal · July 23, 2020, 7:26am

I see you’re trying to calculate loss/loss gradients in a for loop. You’re calculating loss and calling backward() multiple times for the same image. Pytorch expects loss/loss backward to be called on a batch.

Though i dont know exactly what you’re trying to achieve. Might be able to help better if you could provide some background

user_432 · July 23, 2020, 7:56am

Hi @PalaashAgrawal, Thanks for the reply. What I’m basically trying to do is to apply SupConLoss(https://arxiv.org/pdf/2004.11362.pdf) defined below in a Segmentation Pipeline. What I wanted to do is I wanted to define this SupConLoss at the pixel level, that is If two pixels are labeled with the same class, I want to make them a positive pair, otherwise, a negative pair.

class SupConLoss(nn.Module):
def __init__(self, temperature=0.07, contrast_mode='all',
             base_temperature=0.07):
    super(SupConLoss, self).__init__()
    self.temperature = temperature
    self.contrast_mode = contrast_mode
    self.base_temperature = base_temperature

def forward(self, features, labels=None, mask=None):
    """Compute loss for model. If both `labels` and `mask` are None,
    it degenerates to SimCLR unsupervised loss:
    https://arxiv.org/pdf/2002.05709.pdf

    Args:
        features: hidden vector of shape [bsz, n_views, ...].
        labels: ground truth of shape [bsz].
        mask: contrastive mask of shape [bsz, bsz], mask_{i,j}=1 if sample j
            has the same class as sample i. Can be asymmetric.
    Returns:
        A loss scalar.
    """
    device = (torch.device('cuda')
              if features.is_cuda
              else torch.device('cpu'))
    print(features.shape)
    if len(features.shape) < 3:
        raise ValueError('`features` needs to be [bsz, n_views, ...],'
                         'at least 3 dimensions are required')
    if len(features.shape) > 3:
        features = features.view(features.shape[0], features.shape[1], -1)
    print(features.shape)
    batch_size = features.shape[0]
    if labels is not None and mask is not None:
        raise ValueError('Cannot define both `labels` and `mask`')
    elif labels is None and mask is None:
        mask = torch.eye(batch_size, dtype=torch.float32).to(device)
    elif labels is not None:
        print(labels.shape)
        labels = labels.contiguous().view(-1, 1)
        if labels.shape[0] != batch_size:
            raise ValueError('Num of labels does not match num of features')
        mask = torch.eq(labels, labels.T).float().to(device)
        print(mask.shape)
    else:
        mask = mask.float().to(device)

    contrast_count = features.shape[1]
    contrast_feature = torch.cat(torch.unbind(features, dim=1), dim=0)
    if self.contrast_mode == 'one':
        anchor_feature = features[:, 0]
        anchor_count = 1
    elif self.contrast_mode == 'all':
        anchor_feature = contrast_feature
        anchor_count = contrast_count
    else:
        raise ValueError('Unknown mode: {}'.format(self.contrast_mode))

    # compute logits
    anchor_dot_contrast = torch.div(
        torch.matmul(anchor_feature, contrast_feature.T),
        self.temperature)
    # for numerical stability
    logits_max, _ = torch.max(anchor_dot_contrast, dim=1, keepdim=True)
    logits = anchor_dot_contrast - logits_max.detach()

    # tile mask
    mask = mask.repeat(anchor_count, contrast_count)
    # mask-out self-contrast cases
    logits_mask = torch.scatter(
        torch.ones_like(mask),
        1,
        torch.arange(batch_size * anchor_count).view(-1, 1).to(device),
        0
    )
    mask = mask * logits_mask

    # compute log_prob
    exp_logits = torch.exp(logits) * logits_mask
    log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))

    # compute mean of log-likelihood over positive
    mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)

    # loss
    loss = - (self.temperature / self.base_temperature) * mean_log_prob_pos
    loss = loss.view(anchor_count, batch_size).mean()

    return loss

user_432 · July 23, 2020, 7:58am

Hey @joedockrill, I’ve tried retain_graph = True, but the code is somehow struck but the GPU usage is 95%.

PalaashAgrawal · July 23, 2020, 8:14am

I see. But either ways, your loss would be a single real number per image, right?(Assuming you’re doing classification). And you would call backward only once per image after the loss is calculated. Am i right?

user_432 · July 23, 2020, 8:30am

Yeah, I think so you are right.

PalaashAgrawal · July 23, 2020, 8:34am

So yeah, you’ll need to remove loss.backward() out of the for loop. I think that would solve your issue.
The rest, since I haven’t read the paper, i won’t be able to help you in the logic of the loss calculation, but that too might create an error(eg, if you’re adding two values with different dtypes, or there’s a dimension mismatch).

user_432 · July 23, 2020, 8:36am

But when we remove the loss.backward() out of the for loop, it will only return loss only for the last pixel values. Isn’t it?