Fixing code for bounding boxes

I have been trying to implement a system for training bounding boxes for my own projects, but was having trouble because the code breaks whenever a bounding box isn’t present or has been pushed off the image by a transformation.

I have fixed this and can now show batches, and the tests pass as well. But I’m getting lost in all the processors and ItemLists, and I haven’t had time to reason why I have reached some of the errors I have. I guess I’d like some code review for what I’m doing to make sure the logic makes sense.

The first problem was with ImageBBox’s create method. When there is just one bounding box, bboxes are supplied as an np.array with an object and list as its sole element. This caused problems later with collation. The solution was just to be an intermediate and enforce that bounding boxes were indeed floating point numpy array. It is unclear to me why the tiny pascal example worked while a separate dataset with same formatting did not.

Next there was a problem with ImageBBox’s data property. When there is no valid bounding box, the labels become empty. In my case I forced it to take on a value of tensor([0]). This feels wrong, but it seems to work fine for me because categories are labeled starting at 1, so 0 worked as an okay dummy value.

Last was a problem with bb_pad_collate which fail if there are missing bounding boxes. My solution was just to populate the bounding box with zeros if there was data provided.

Ideally, having to pass in dummy values shouldn’t be there, but for now this is working for me. But does anyone have suggestions on how to go about improving on this? I’ve spent some time exploring the different ItemLists and trying to follow the logic. Maybe someone can point me to a forum post where this can help make this clearer for me.

1 Like

Erik, do you have an example of the format of the labels for a bounding box problem?

As to your dummy value problem. Couldn’t you just not include the sample if bbox size is 0?

remove_rows = []
for i, s in enumerate(samples):
  bbs, lbls = s[1].data
  if bbs.size(0) == 0:
    bboxes[i, -len(lbls):] = bbs
    labels[i, -len(lbls):] = lbls
# remove rows from bboxes and labels using the remove_rows list
good_rows = torch.LongTensor([r for r in range(bboxes.size(0)) if r not in set(remove_rows)])
bboxes, labels = bboxes.index_select(0, good_rows), labels.index_select(0, good_rows)

return, 0), (bboxes, labels)
1 Like

Thank you @erikg for your suggestions.
I followed your suggestions and then applied both yours and @dhpollack 's solution for the error on collate.
I still have an error here:

File "/home/pietro/.local/lib/python3.6/site-packages/fastai/vision/", line 55, in bb_pad_collate
    labels[i,-len(lbls):] = lbls
TypeError: can't assign a numpy.ndarray to a torch.LongTensor

any suggestions?

Since I made this fix on fastai code, coco example with Object Detection (which was misteriously working before) also gives me this problem with Numpy.array vs LongTensor.

@dhpollack any clues?

to fix this error I just rolled back to:

lbls = tensor([ for o in lbls]) if lbls is not None else None

that was causing the lbls to be of the wrong type…

@dhpollack @pietro.latorre
Sorry it took me so long to get back to this thread. I finally got back to it and got single bounding box detection working. Here’s some code:

from import *
from import cnn_config, num_features_model

This is just changing create_cnn to only output 5 values. (bbox and whether the object is detected in the image)

def create_single_bbox(data:DataBunch, arch:Callable, cut:Union[int,Callable]=None, pretrained:bool=True,
                lin_ftrs:Optional[Collection[int]]=None, ps:Floats=0.5,
                custom_head:Optional[nn.Module]=None, split_on:Optional[SplitFuncOrIdxList]=None,
                bn_final:bool=False, **learn_kwargs:Any)->Learner:
    "Build convnet style learners."
    meta = cnn_config(arch)
    body = create_body(arch, pretrained, cut)
    nf = num_features_model(body) * 2
    # The only line that changed is blow here
    head = custom_head or create_head(nf, 5, lin_ftrs, ps=ps, bn_final=bn_final)
    model = nn.Sequential(body, head)
    learn = Learner(data, model, **learn_kwargs)
    if pretrained: learn.freeze()
    apply_init(model[1], nn.init.kaiming_normal_)
    return learn

This is a hack to make reconstruct work from the above defined network.

class MyObjectCategoryList(ObjectCategoryList):
    def reconstruct(self, t, x):
        if not isinstance(t, list):
            t = [t[0:4].view(-1, 4), (t[-1].view(-1, 1) > 0).long()]
        (bboxes, labels) = t
        if len((labels - self.pad_idx).nonzero()) == 0: return
        i = (labels - self.pad_idx).nonzero().min()
        bboxes,labels = bboxes[i:],labels[i:]
        return ImageBBox.create(*x.size, bboxes, labels=labels, classes=self.classes, scale=False)

My problem was from a medical dataset. The csv was just two columns with filenames and a string of four values for the bounding box. If the box is ‘0 0 0 0’ in image, I give it a background class. More on that in a second. I use a weird loss function to deal with that.

df = pd.read_csv(PATH / 'bb.csv')
d = {Path(k).name: [float(i) for i in v.split(' ')] for k, v in zip(df['scan id'], df['bb'])}

def get_heart_y(o):
    y = d[]
    if all([i == 0 for i in y]):
        return [[y], ['background']]
        return [[y], ['left atrium']]

Following how the docs do object detection from coco, I do almost the same thing except use a different label_cls. (While also using correct path and splitting functions.)

db = ObjectItemList.from_folder(PATH / 'scan-slice-128')
db = db.split_by_valid_func(lambda x: int(x.stem) < 1907)
db = db.label_from_func(get_heart_y, label_cls=MyObjectCategoryList)
db = db.transform(get_transforms(), tfm_y=True)
db = db.databunch(bs=16, collate_fn=bb_pad_collate)

The last challenge was the loss function. The logic here is essentially, if there is a bounding box of ‘0,0,0,0’, don’t include the regression error in the loss. And also add a binary cross entropy loss for the class of interest. (Use a cheap hack where I use target values of 0s and 1s to mask away bbox losses if I don’t want them.)

def single_bbox_loss(preds, bboxes, targs):
    pred_bboxes = preds[:, :4]
    pred_targs = preds[:, -1]
    bbox_loss = F.smooth_l1_loss(pred_bboxes, bboxes[:, 0, ...], reduction='none').mean(dim=-1)
    bbox_loss = (bbox_loss * targs[:, 0].float()).sum() / targs.sum().float()
    binary_loss = F.binary_cross_entropy_with_logits(pred_targs, targs.flatten().float())
    return bbox_loss + binary_loss

And that’s it. plug into a learn object and let it go. Hooking things up to more classes in a real Yolo type network should be pretty easy. I did a lot of silly things just to make the problem go faster, but it was a quick way for me to get it to work. Most of the code is just hijacking current fastai functions.

learn = create_single_bbox(db, models.resnet34, loss_func=single_bbox_loss)

1 Like

Instead of defining the create_single_bbox method couldn’t you just create the custom head and pass that to the existing create_cnn method and save some code?

for resnet34

custom_head = create_head(1024, 5, None, .5, False)