BBox Convention Clarification

I’m looking at the bbox documentation and not quite sure what this means:

bboxes: list of bboxes (each of those being four integers with the top, left, bottom, right convention)

What is the top, left, bottom, right convention? I’ve tried [Top-Left, Top-Right, Bottom-Left, Bottom-Right] and I’ve tried [Top-Left, Bottom-Left, Bottom-Right, Top-Right].

Both of these options give me:

IndexError                                Traceback (most recent call last)
<ipython-input-119-53c88e1e9b9a> in <module>()
----> 1 ImageBBox.create([[[bbox[0],bbox[2],bbox[1],bbox[3]]]],*img.size)

~/fastai_v1/fastai/kaggle/Competitions/fastai/vision/ in create(cls, bboxes, h, w, labels, pad_idx)
    220         pxls = torch.zeros(len(bboxes),h, w).long()
    221         for i,bbox in enumerate(bboxes):
--> 222             pxls[i,bbox[0]:bbox[2]+1,bbox[1]:bbox[3]+1] = 1
    223         bbox = cls(pxls.float())
    224         bbox.labels,bbox.pad_idx = labels,pad_idx

IndexError: list index out of range

You just want a list of lists (because one image can have multiple objects), not three levels like you have right now. The convention is to pass a list of [top, left, bottom, right] (each of those being an integer). You can see example of uses in the doc:

img = open_image('imgs/car_bbox.jpg')
bbox = ImageBBox.create([[96, 155, 270, 351]], *img.size)

Note that this part is still under construction (I’m actually working on implementing the Retina Net) so it’s not entirely stable right now.


I was also playing around with that and at first I also was not quite sure and had to try different things.

But it is like Sylvain/the docs say: instead of putting in a list of points with x,y values (x = horizontal, y = vertical coordinates) for the four edges you just give them the bounds, i.e., upper y (= Top), lower y (= bottom), left x (=left), right x (= right):

| = left  
┌───┐= upper
│   │
└───┘= lower
    | = right

Hope this visualization helps you to see some nice bounding boxes! :slight_smile:

EDIT: Ok, you solved it later here ImageBBox.create Error - TypeError: slice indices must be integers or None or have an __index__ method

1 Like

Hey, just a heads up if you are working with retina. It seems not to be giving good scores on pascal voc. More data augmentations might help, I am not sure.

Better to train on coco dataset if you can.

I know this thread is old but none of the replies helped me understand what format each box needed to be in. It took a bit of guessing and testing to get it working so I’m including my solution here.

I use dataclasses to help organize the data and the following is the (relevant) parts to convert what was parsed into an ImageBBox.

class Image:
    path: pathlib.Path

    # width, height
    size: Tuple[int, int] = None

    # bbox_x bbox_y bbox_width bbox_height
    bounding_box: Tuple[int, int, int, int] = None

    def bbox(self):
        top_left = tuple(reversed(self.bounding_box[:2]))  # y, x
        size = tuple(reversed(self.bounding_box[2:]))  # h, w
        bottom_right = top_left[0] + size[0], top_left[1] + size[1]  # y, x
        return ImageBBox.create(
            [[*top_left, *bottom_right]],

From there I can show the image with the bounding box:

i = open_image(image.path)