Bbox, padding, and get_y in object detection

Jeremy’s implementation of get_y is as follows:

def get_y(bbox, clas):
    bbox = bbox.view(-1, 4) #/sz
    bb_keep = ((bbox[:, 2] - bbox[:, 0]) > 0).nonzero()[:, 0]
    return bbox[bb_keep], clas[bb_keep]

I wonder if this can be simplified as:

def get_y(bbox, clas):
    keep = bbox[:, 2] > 0
    return bbox[keep], clas[keep]

Basically, if the third column of bbox is not positive, the corresponding row is just padding and will not be kept. Also, bbox.view(-1, 4) is a no-op as the last dimension of bbox is just 4. Not sure the reason for dividing by sz.

I tried the modified version on VOC2007 and the train/validation losses look better.

As a background, I found the following in Hiromi Suenaga’s note on Lesson #9:

A lot of code you find on the internet does not work with mini-batches. It only does one thing at a time which we don’t want. In this case, all these functions (get_y, actn_to_bb, map_to_ground_truth) is working on, not exactly a mini-batch at a time, but a whole bunch of ground truth objects at a time. The data loader is being fed a mini-batch at a time to do the convolutional layers. Because we can have different numbers of ground truth objects in each image but a tensor has to be the strict rectangular shape, fastai automatically pads it with zeros (any target values that are shorter) [1:11:08]. This was something that was added recently and super handy, but that does mean that you then have to make sure that you get rid of those zeros. So get_y gets rid of any of the bounding boxes that are just padding.