I’m looking at the bbox documentation and not quite sure what this means:
bboxes: list of bboxes (each of those being four integers with the top, left, bottom, right convention)
What is the top, left, bottom, right convention? I’ve tried [Top-Left, Top-Right, Bottom-Left, Bottom-Right] and I’ve tried [Top-Left, Bottom-Left, Bottom-Right, Top-Right].
Both of these options give me:
IndexError Traceback (most recent call last)
<ipython-input-119-53c88e1e9b9a> in <module>()
----> 1 ImageBBox.create([[[bbox,bbox,bbox,bbox]]],*img.size)
~/fastai_v1/fastai/kaggle/Competitions/fastai/vision/image.py in create(cls, bboxes, h, w, labels, pad_idx)
220 pxls = torch.zeros(len(bboxes),h, w).long()
221 for i,bbox in enumerate(bboxes):
--> 222 pxls[i,bbox:bbox+1,bbox:bbox+1] = 1
223 bbox = cls(pxls.float())
224 bbox.labels,bbox.pad_idx = labels,pad_idx
IndexError: list index out of range
You just want a list of lists (because one image can have multiple objects), not three levels like you have right now. The convention is to pass a list of [top, left, bottom, right] (each of those being an integer). You can see example of uses in the doc:
img = open_image('imgs/car_bbox.jpg')
bbox = ImageBBox.create([[96, 155, 270, 351]], *img.size)
Note that this part is still under construction (I’m actually working on implementing the Retina Net) so it’s not entirely stable right now.
I was also playing around with that and at first I also was not quite sure and had to try different things.
But it is like Sylvain/the docs say: instead of putting in a list of points with x,y values (x = horizontal, y = vertical coordinates) for the four edges you just give them the bounds, i.e., upper y (= Top), lower y (= bottom), left x (=left), right x (= right):
| = left
| = right
Hope this visualization helps you to see some nice bounding boxes!
EDIT: Ok, you solved it later here ImageBBox.create Error - TypeError: slice indices must be integers or None or have an __index__ method
Hey, just a heads up if you are working with retina. It seems not to be giving good scores on pascal voc. More data augmentations might help, I am not sure.
Better to train on coco dataset if you can.
I know this thread is old but none of the replies helped me understand what format each box needed to be in. It took a bit of guessing and testing to get it working so I’m including my solution here.
dataclasses to help organize the data and the following is the (relevant) parts to convert what was parsed into an
# width, height
size: Tuple[int, int] = None
# bbox_x bbox_y bbox_width bbox_height
bounding_box: Tuple[int, int, int, int] = None
top_left = tuple(reversed(self.bounding_box[:2])) # y, x
size = tuple(reversed(self.bounding_box[2:])) # h, w
bottom_right = top_left + size, top_left + size # y, x
From there I can show the image with the bounding box:
i = open_image(image.path)