Jeremy’s implementation of
get_y is as follows:
def get_y(bbox, clas): bbox = bbox.view(-1, 4) #/sz bb_keep = ((bbox[:, 2] - bbox[:, 0]) > 0).nonzero()[:, 0] return bbox[bb_keep], clas[bb_keep]
I wonder if this can be simplified as:
def get_y(bbox, clas): keep = bbox[:, 2] > 0 return bbox[keep], clas[keep]
Basically, if the third column of
bbox is not positive, the corresponding row is just padding and will not be kept. Also,
bbox.view(-1, 4) is a no-op as the last dimension of
bbox is just 4. Not sure the reason for dividing by
I tried the modified version on VOC2007 and the train/validation losses look better.
As a background, I found the following in Hiromi Suenaga’s note on Lesson #9:
A lot of code you find on the internet does not work with mini-batches. It only does one thing at a time which we don’t want. In this case, all these functions (get_y, actn_to_bb, map_to_ground_truth) is working on, not exactly a mini-batch at a time, but a whole bunch of ground truth objects at a time. The data loader is being fed a mini-batch at a time to do the convolutional layers. Because we can have different numbers of ground truth objects in each image but a tensor has to be the strict rectangular shape, fastai automatically pads it with zeros (any target values that are shorter) [1:11:08]. This was something that was added recently and super handy, but that does mean that you then have to make sure that you get rid of those zeros. So get_y gets rid of any of the bounding boxes that are just padding.