Custom loss IOU Runtime error

meli · October 29, 2018, 11:49am

Hi,
I am working on a multiclass segmentation task, and I’ve been stuck trying to implement loss for IoU metrics.
I am using fastai v1.0.11.
I am running into the following error when putting my loss function (which is supposed to return 1-iou) in my learner :

learner = Learner(bunch, custom_model, loss_func=IOU(), metrics=[iou],  callback_fns=ShowGraph)
learner.fit_one_cycle(1)

<ipython-input-127-158e79851ed2> in <module>()
----> 1 learner.fit_one_cycle(1)

~/anaconda3/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     17     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     18                                         pct_start=pct_start, **kwargs))
---> 19     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     20 
     21 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, **kwargs:Any):

~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    134         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    135         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 136             callbacks=self.callbacks+callbacks)
    137 
    138     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     86     except Exception as e:
     87         exception = e
---> 88         raise e
     89     finally: cb_handler.on_train_end(exception)
     90 

~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     76             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     77                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 78                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)[0]
     79                 if cb_handler.on_batch_end(loss): break
     80 

~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     23     if opt is not None:
     24         loss = cb_handler.on_backward_begin(loss)
---> 25         loss.backward()
     26         cb_handler.on_backward_end()
     27         opt.step()

~/anaconda3/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    100                 products. Defaults to ``False``.
    101         """
--> 102         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    103 
    104     def register_hook(self, hook):

~/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     88     Variable._execution_engine.run_backward(
     89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
     91 
     92 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Here is my implementation of IOU loss class. and iou metrics (not the most efficient, but it works)

class IOU(torch.nn.Module):
    def __init__(self, classes = np.array(range(13))):
        super(IOU,self).__init__()
        self.classes = classes
    def forward(self, input, target):    
        return (1- iou(input, target, self.classes))

#iou metrics
def tp_fp_fn_1class(pred, gt):
    """
    :param pred : binary tensor of predictions for that class of shape image shape
    :param gt: ground truth tensor
    :returns : true positive, false positive, false negative rate for that class
    """
    tp = (pred*gt).sum(dim=0).sum().float()
    fp = ((pred - gt) > 0).sum(dim=0).sum().float()
    fn = ((gt - pred) > 0).sum(dim=0).sum().float()
    return tp, fp, fn

def iou_1class(tp,fp,fn):
    return(tp/(tp+fp+fn))

def iou(input:Tensor, targs:Tensor, classes = np.array(range(13)),ignore_class = [0]):
    preds_ = input.argmax(dim = 1)
    num_sample = input.shape[0]
    ious = np.zeros((num_sample,len(classes)))
    classes_ = np.array(classes.copy())
    if ignore_class != None:
        for i in ignore_class: 
            classes_ = np.delete(classes_, np.where(classes_ == i)[0])
    num_class = len(classes_)
    for i in range(num_sample):
        pred = preds_[i] 
        gt = targs[i]
        for c in classes_:
            tp, fp, fn = tp_fp_fn_1class((pred==c).float(), (gt==c).float())
            ious[i][c] = iou_1class(tp,fp,fn)
            if np.isnan(ious[i][c]):
                ious[i][c]= 0
    avg_im = np.sum(ious, axis = 0)/num_sample
    avg_iou = np.sum(avg_im)/num_class
    return tensor(avg_iou)
    ```

Thank you very much for your help!

sgugger · October 29, 2018, 1:35pm

You can’t go to numpy in your loss function otherwise pytorch won’t be able to backpropagate properly. This is what is causing your error.
You should rewrite your iou function in complete pytorch.

jeremy · October 29, 2018, 1:44pm

You can use our implementation as a starting point:

http://docs.fast.ai/metrics.html#dice

meli · October 30, 2018, 8:56am

Thank you very much, I rewrote everything in pytorch, checked that I get the same results as with my previous implementation but I still get the same error :
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Here is my new piece of code:

def iou_multiclass(input:Tensor, targs:Tensor,  classes = 13)->Rank0Tensor:
    "Dice coefficient metric for binary target. If iou=True, returns iou metric, classic for segmentation problems."
    n = targs.shape[0]
    input = input.argmax(dim=1).view(n,-1)
    targs = targs.view(n,-1)
    
    ious = torch.zeros([n,classes])
    
    #ignore class 0
    for c in range(1, classes):
        input_ = (input==c)
        targs_ = (targs==c)
        intersect_ = (input_*targs_).sum(dim = 1).float()
        union_ = (input_+targs_).sum(dim = 1).float()
        ious[:,c] = intersect_ / (union_-intersect_+1.0)
    print(ious)
    res = ious.sum(dim = 0)/n
    
    res = res.sum()/(classes - 1)
    return tensor(res)

class IOUMultLoss(torch.nn.Module):
    def __init__(self, classes = 13):
        super(IOUMultLoss,self).__init__()
        self.classes = classes
    def forward(self, input, target):    
        return (1- iou_multiclass(input, target, self.classes))

Thanks again for your help !

sgugger · October 30, 2018, 1:47pm

That is weird. The problem is again that your result doesn’t have require_grads=True, but I don’t see what in your code causes this. To debug, I’d suggest to run your steps one by one and try to see at which point your input tensor (which should have a grad_fn since it’s the output of the module) loses it.

meli · October 30, 2018, 4:46pm

If I return res.requires_grad_(True) in my iou_multiclass function, I don’t get any error.
Edit : actually, as long as I put .requires_grad_(True)somewhere (in the forward function of IOUMultLoss, or when assigning res in iou_multiclass) , it works.
But it seems the model is not learning anything, accuracy stays around 0.06 after 15 epochs when it would be around 0.6 only after 2 epochs when I was using CrossEntropyFlat() instead of the IOU loss function.
What am I missing here ?
Thank you !

sgugger · October 30, 2018, 7:09pm

No, returning res.requires_grad_(True) won’t help you: your loss is detached from the history of how it was created so your gradients are all nil afterward, hence your absence of training.
You really need to debug when that requires_grad flag disappears from your input.

meli · October 31, 2018, 1:28pm

Thanks !
It seems that up until my loss function, the requires_grad flag is set to True for my input, …
I am working on another version of the iou loss which doesn’t involve any for loop (I don’t know if that can be the source of the problem, but I’ll try).
Edit : I have the exact same error with this second version of IoU loss …

def iou_multiclass_v2(input:Tensor, targs:Tensor, classes = 13)->Rank0Tensor:
  
    eps = 1e-15
    print(input.requires_grad)
   #yields True 
    n, c,w,h = input.shape
    range_tensor_ = torch.stack([torch.arange(classes)]*w*h, dim=1).view(classes,-1).cuda()
    range_tensor_batch_ = torch.stack([range_tensor_]*n, dim = 1).float().cuda()
    
    input_ = (torch.stack([input.argmax(dim = 1)]*classes)).view(c,n, -1).float().cuda()
    targ = torch.stack([targs.squeeze(1)]*classes).float().view(c,n, -1).cuda()
  #target is replicated n_class times to be compared with range_tensor_batch, which is a stack of [tensor of zeros, tensor of ones, ... tensor of (num_classes)]

    input_ = (input_==range_tensor_batch_).float()
    targs_ = (targ == range_tensor_batch_).float()
    intersect_ = (input_*targs_).sum(dim = 2).float()
    union_ = (input_+targs_).sum(dim = 2).float()
    ious = (intersect_ + eps)/ (union_-intersect_+eps)
    res = ious.sum(dim = 0)/n
    res = res.sum()/(classes )
    return tensor(res)

Edit :
Actually I figured the error does probably not come from my iou metrics function itself, but I think it comes from the use of argmax, or some non differentiable function I am using…

ilovescience · April 11, 2019, 3:04am

Indeed it has to do with the fact that argmax is not differentiable. Instead, a soft loss can be implemented by using the probabilities for calculating the loss.

stephan.kuo · September 9, 2020, 6:39am

I have the same problem when implementing dice loss. ‘grad_fn’ becomes None after argmax. Since argmax is not differentiable, I’m wondering how exactly is dice loss or IOU loss implemented?