RetniaNet Focal Loss class targets are empty

Loki1725 · November 12, 2019, 6:50am

I working on a detection model based off the Pascal.ipynb in the course material. My data is comprised of small objects in a mostly empty field, so the RetinaNet model seems like a good fit. Right now I’m starting with a simple case with only 1 class in my images, and I’ll train more classes in as the data becomes available.

I’m trying to find the learning rate, but the RetinaNetFocalLoss function keeps crashing. The problem is one of the clas_tgt tensors getting passed into the function is all zeros, so the torch.min function is generating an error.
" in _unpad(failed resolving arguments)
23 print("bbox_tgts: “, bbox_tgt)
24 print(“self.pad_idx”, self.pad_idx)
—> 25 i = torch.min(torch.nonzero(clas_tgt-self.pad_idx))
26 return tlbr2cthw(bbox_tgt[i:]), clas_tgt[i:]-1+self.pad_idx
27
RuntimeError: invalid argument 1: cannot perform reduction function min on tensor with no elements because the operation does not have an identity at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathReduce.cu:64”

For debug I added some print statements on the bounding boxes targets and the class targets. I see these generating ‘good’ data for the first pass through, then the class target tensors is all zeros.

"output:
[tensor([[[-4., -4.],
[-4., -4.],
[-4., -4.],
…,
[-4., -4.],
[-4., -4.],
[-4., -4.]],

    [[-4., -4.],
     [-4., -4.],
     [-4., -4.],
     ...,
     [-4., -4.],
     [-4., -4.],
     [-4., -4.]],

    [[-4., -4.],
     [-4., -4.],
     [-4., -4.],
     ...,
     [-4., -4.],
     [-4., -4.],
     [-4., -4.]],

    [[-4., -4.],
     [-4., -4.],
     [-4., -4.],
     ...,
     [-4., -4.],
     [-4., -4.],
     [-4., -4.]]], device='cuda:0', grad_fn=<CatBackward>), 
     tensor([[[0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     ...,
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.]],

    [[0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     ...,
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.]],

    [[0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     ...,
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.]],

    [[0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     ...,
     [0., 0., 0., 0.],
     [0., 0., 0., 0.],
     [0., 0., 0., 0.]]], device='cuda:0', grad_fn=<CatBackward>),

[[32, 32], [64, 64], [8, 8], [4, 4], [2, 2]]]

clas_tgts: tensor([1, 1, 1, 1], device=‘cuda:0’)
bbox_tgts: tensor([[-0.7116, 0.2994, -0.4689, 0.5388],
[ 0.1855, 0.7000, 0.4048, 0.9218],
[ 0.5571, -0.7715, 0.7984, -0.4500],
[-0.8330, 0.2960, -0.6692, 0.4765]], device=‘cuda:0’)
self.pad_idx 0

clas_tgts: tensor([0, 0, 1, 1], device=‘cuda:0’)
bbox_tgts: tensor([[0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000],
[0.5469, 0.7584, 0.6863, 1.0000],
[0.4406, 0.7306, 0.5344, 0.8733]], device=‘cuda:0’)
self.pad_idx 0

clas_tgts: tensor([1, 1, 1, 1], device=‘cuda:0’)
bbox_tgts: tensor([[-0.0200, -0.6015, 0.6444, -0.4047],
[-0.1858, -0.6790, 0.0828, -0.5157],
[ 0.5301, 0.5670, 0.7313, 0.7774],
[ 0.1561, 0.3476, 0.2968, 0.4721]], device=‘cuda:0’)
self.pad_idx 0

clas_tgts: tensor([0, 0, 1, 1], device=‘cuda:0’)
bbox_tgts: tensor([[ 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.4396, 0.2111, 0.7538, 0.4614],
[-0.1625, -0.1494, 0.1980, 0.1113]], device=‘cuda:0’)
self.pad_idx 0

output:
[tensor([[[-4.0000, -4.0000],
[-4.0000, -4.0000],
[-4.0000, -4.0000],
…,
[-4.0000, -4.0000],
[-4.0000, -4.0000],
[-4.0000, -4.0000]],

    [[-3.9999, -4.0000],
     [-3.9999, -4.0000],
     [-3.9999, -4.0000],
     ...,
     [-4.0000, -4.0000],
     [-4.0000, -4.0000],
     [-4.0000, -4.0000]],

    [[-3.9999, -4.0000],
     [-3.9999, -4.0000],
     [-3.9999, -4.0000],
     ...,
     [-4.0000, -4.0000],
     [-4.0000, -4.0000],
     [-4.0000, -4.0000]],

    [[-3.9999, -4.0000],
     [-3.9999, -4.0000],
     [-3.9999, -4.0000],
     ...,
     [-4.0000, -4.0000],
     [-4.0000, -4.0000],
     [-4.0000, -4.0000]]], device='cuda:0', grad_fn=<CatBackward>), 
     tensor([[[ 5.8823e-05,  3.8971e-05,  5.0638e-05, -6.1263e-05],
     [ 4.3368e-05,  4.2452e-05,  5.9110e-05, -5.6631e-05],
     [ 2.3543e-05, -1.8120e-06,  6.0415e-05, -5.9552e-05],
     ...,
     [ 9.0789e-07,  6.5124e-06, -9.2335e-06,  8.7830e-06],
     [ 5.7327e-07, -3.7689e-06, -6.4236e-06,  8.1218e-06],
     [-3.6391e-06,  6.1791e-06, -7.5774e-06,  7.5879e-06]],

    [[ 6.3462e-05,  3.8431e-05,  5.2794e-05, -6.6446e-05],
     [ 4.4774e-05,  4.5052e-05,  6.3032e-05, -5.9442e-05],
     [ 1.7675e-05, -1.4633e-06,  6.6049e-05, -6.6170e-05],
     ...,
     [ 8.7837e-07,  6.7707e-06, -9.4765e-06,  9.1426e-06],
     [ 3.7155e-07, -3.9938e-06, -6.7399e-06,  8.4144e-06],
     [-4.1812e-06,  6.7351e-06, -7.7293e-06,  7.7419e-06]],

    [[ 6.3581e-05,  3.9795e-05,  5.4961e-05, -6.7096e-05],
     [ 4.5167e-05,  4.3474e-05,  6.3230e-05, -6.1343e-05],
     [ 1.9188e-05, -3.1147e-06,  6.7213e-05, -6.6160e-05],
     ...,
     [ 8.4529e-07,  8.3856e-06, -1.1132e-05,  1.0754e-05],
     [-2.4030e-07, -4.5468e-06, -8.1089e-06,  1.0164e-05],
     [-4.8756e-06,  7.5789e-06, -9.4542e-06,  9.0894e-06]],

    [[ 6.5023e-05,  4.1934e-05,  5.4422e-05, -6.7103e-05],
     [ 4.9524e-05,  4.6245e-05,  6.4738e-05, -6.2913e-05],
     [ 2.1542e-05, -4.6008e-06,  6.6489e-05, -6.6718e-05],
     ...,
     [ 6.3114e-07,  6.1879e-06, -8.6072e-06,  8.1754e-06],
     [ 5.6170e-07, -3.4561e-06, -6.0266e-06,  7.6720e-06],
     [-3.6853e-06,  5.9568e-06, -7.2529e-06,  6.8260e-06]]],
   device='cuda:0', grad_fn=<CatBackward>),
   [[32, 32], [64, 64], [8, 8], [4, 4], [2, 2]]]

clas_tgts: tensor([0, 1, 1], device=‘cuda:0’)
bbox_tgts: tensor([[ 0.0000, 0.0000, 0.0000, 0.0000],
[-0.6563, 0.3660, 0.0242, 0.8674],
[ 0.0959, 0.4811, 0.3723, 0.8862]], device=‘cuda:0’)
self.pad_idx 0

clas_tgts: tensor([0, 0, 0], device=‘cuda:0’)
bbox_tgts: tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]], device=‘cuda:0’)
self.pad_idx 0"

I’ve spent the last few days trying to go through my data set to ensure I don’t have any images with no objects in them. And I’ve attempted to modify the RetniaNet model, and Loss Function parameters a bit to see if this will change anything. Specifically I changed alpha to be 0.1 as many images only have a small object in them, and I changed gamma to 5.0, as the Lin/Goyal/Girshick/He/Dollar paper recommended.

Does anyone here have any suggestions on what I might look next? I’m pretty sure my issue is with my input data, but I don’t know what to look for next.

Honigtoast · November 12, 2019, 12:12pm

Hi Nathan,
are your images square images or rectangle? I probably encountered the same problem a few days ago and found the solution in this thread. As the resizing method cuts off some parts of the images during transforms it can happen that the only parts with bboxes are cut off and then the error happens. When using SQUISH as resize method, it works (at least that di it for me).
And if you are using rectangle images, you have to pass the size of the images to get_data as well: for example get_data("bs",128) resizes to square images and get_data("bs", (80,128)) would be the size of some rectangle image. The size is passed as height x width.
Maybe this can help you fix your code, too

Loki1725 · November 26, 2019, 6:38pm

That was it. I made some changes to pass in rectangular data and now it’s working. Thank you for the suggestion!

Honigtoast · November 27, 2019, 10:14am

Good to hear it is working now! Happy to help!