Error on get_y(bbox,clas) , Lessen 9, Multi Object

I am trying to use the lessen 9 multi part object detection on my own data but function

def get_y(bbox,clas):
    bbox = bbox.view(-1,4)/sz
    bb_keep = ((bbox[:,2]-bbox[:,0])>0).nonzero()[:,0]
    return bbox[bb_keep],clas[bb_keep]

gives an error which seems to be related to empty bbox.
Here is the code to reproduce the error

bbox = np.array([[0.5000 , 0.2009 , 0.5000,  0.6071],[0.4643 , 0.5938,  0.4643,  0.6964]])
bbox = VV(bbox)

and the error is:

IndexError Traceback (most recent call last)
in ()
----> 1 ((bbox[:,2]-bbox[:,0])>0).nonzero()[:,0]

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/ in getitem(self, key)
76 return IndexSelect.apply(self, 0, key)
77 # else fall through and raise an error in Index
—> 78 return Index.apply(self, key)
80 def setitem(self, key, value):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/_functions/ in forward(ctx, i, index)
87 result = i.index(ctx.index)
88 else:
—> 89 result = i.index(ctx.index)
90 ctx.mark_shared_storage((i, result))
91 return result

IndexError: trying to index 2 dimensions of a 0 dimensional tensor

I am wondering why get_y is fed with empty bbox ( left and right heights are same)?

I’m having the same issue. As seen in the pdb below, nonzero() returns an empty tensor since it is being passed an all-zero tensor.

I wonder if this is because my data only has true bounding boxes for some images… I have tried passing in NaN NaN NaN NaN for the ‘no box’ images, as well as 0 0 0 0, and I run into the same error both times. I was able to work through the single image classification+bbox notebook without issues.

(Pdb) p bbox.view(-1,4)/sz
Variable containing:
0 0 0 0
0 0 0 0
0 0 0 0
[torch.cuda.FloatTensor of size 3x4 (GPU 0)]
(Pdb) p ((bbox[:,2]-bbox[:,0])>0).nonzero()
Variable containing:[torch.cuda.LongTensor with no dimension]

(Pdb) ((bbox[:,2]-bbox[:,0])>0).nonzero()[:,0]
*** IndexError: trying to index 2 dimensions of a 0 dimensional tensor

After some struggling I figured out what is going wrong and it has to do with resizing of small bounding boxes.
If you have high resolution images with very small bounding boxes around small images, then when you resize the image to fulfill the constraints of your backbone and custom head then it is likely that the height (or width) of the small boxes collapse into each-other and then you will have this problem.
For me the solution was to use higher resolution image (setting sz=4*224) and compensate the larger grid size by adding two extra layers of stride 2 convolution to the top of backbone . More over to be on the safe side I added an extra check to ssd_loss function as:

def ssd_loss(pred,targ,print_it=False):
    lcs,lls = 0.,0.
    for b_c,b_bb,bbox,clas in zip(*pred,*targ):
        if  len(((bbox.view(-1,4)[:,2]-bbox.view(-1,4)[:,0])>0).nonzero())==0:
        loc_loss,clas_loss = ssd_1_loss(b_c,b_bb,bbox,clas,print_it)
        lls += loc_loss
        lcs += clas_loss
    if print_it: print(f'loc: {[0]}, clas: {[0]}')
    return lls+lcs