I’m looking at the bbox documentation and not quite sure what this means:
bboxes: list of bboxes (each of those being four integers with the top, left, bottom, right convention)
What is the top, left, bottom, right convention? I’ve tried [Top-Left, Top-Right, Bottom-Left, Bottom-Right] and I’ve tried [Top-Left, Bottom-Left, Bottom-Right, Top-Right].
Both of these options give me:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-119-53c88e1e9b9a> in <module>()
----> 1 ImageBBox.create([[[bbox[0],bbox[2],bbox[1],bbox[3]]]],*img.size)
~/fastai_v1/fastai/kaggle/Competitions/fastai/vision/image.py in create(cls, bboxes, h, w, labels, pad_idx)
220 pxls = torch.zeros(len(bboxes),h, w).long()
221 for i,bbox in enumerate(bboxes):
--> 222 pxls[i,bbox[0]:bbox[2]+1,bbox[1]:bbox[3]+1] = 1
223 bbox = cls(pxls.float())
224 bbox.labels,bbox.pad_idx = labels,pad_idx
IndexError: list index out of range
You just want a list of lists (because one image can have multiple objects), not three levels like you have right now. The convention is to pass a list of [top, left, bottom, right] (each of those being an integer). You can see example of uses in the doc:
I was also playing around with that and at first I also was not quite sure and had to try different things.
But it is like Sylvain/the docs say: instead of putting in a list of points with x,y values (x = horizontal, y = vertical coordinates) for the four edges you just give them the bounds, i.e., upper y (= Top), lower y (= bottom), left x (=left), right x (= right):
| = left
┌───┐= upper
│ │
└───┘= lower
| = right
Hope this visualization helps you to see some nice bounding boxes!
Hey, just a heads up if you are working with retina. It seems not to be giving good scores on pascal voc. More data augmentations might help, I am not sure.
I know this thread is old but none of the replies helped me understand what format each box needed to be in. It took a bit of guessing and testing to get it working so I’m including my solution here.
I use dataclasses to help organize the data and the following is the (relevant) parts to convert what was parsed into an ImageBBox.
@dataclass
class Image:
path: pathlib.Path
# width, height
size: Tuple[int, int] = None
# bbox_x bbox_y bbox_width bbox_height
bounding_box: Tuple[int, int, int, int] = None
@cached_property
def bbox(self):
top_left = tuple(reversed(self.bounding_box[:2])) # y, x
size = tuple(reversed(self.bounding_box[2:])) # h, w
bottom_right = top_left[0] + size[0], top_left[1] + size[1] # y, x
return ImageBBox.create(
self.size[1],
self.size[0],
[[*top_left, *bottom_right]],
)
From there I can show the image with the bounding box: