What's the proper way to handle bounding box transformations?

I’m trying to replicate the bounding box regressor from 2018’s lesson 8 but using fastai v1.

I’m not able to get the bounding boxes transformed.
I’m using a file with the format:

image_path, x0 y0 x1 y1

What I’m trying:

df = pd.DataFrame( {'fn': [ i[0] for i in imgs ] , 
                    'bb': [ ' '.join(str(p) for p in i[1]) for i in imgs]
                   }, columns = ['fn', 'bb'])
df.to_csv('dataset.csv', index=False)

data = ImageList.from_csv(path='.', 
                         csv_name='dataset.csv', 
                         folder='.',
                         )
data = data.split_by_rand_pct()
data = data.split_none()
data = data.label_from_df(cols=[ bb'], label_cls=FloatList)
data = data.transform(None, size=224, resize_method= ResizeMethod.SQUISH, tfm_y=True)

I’m using None as the first argument of transform since I only want to resize the images without cropping them.

If I show data I get the following:

ImageDataBunch;

Train: LabelList (1727 items)
x: ImageList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: FloatList
[ 35.  72. 153. 172.],[344. 238. 416. 312.],[  0.  66. 121. 182.],[ 25.  59. 161. 197.],[ 16.  23. 483.  91.]
Path: .;

Which is the exact same bounding boxes in the dataset. There are even bounding boxes that are outside the images (344 > 224, 238 > 224).

What I’ve tried so far:

  1. Different values for tfm_y (True/TfmCoord/TfmPixel)
  2. Getting the bounding boxes coordinates from a function
  3. Getting the bounding boxes coordinates from different columns of the dataframe
  4. Combinations of 1,2,3 with and without label_cls=FloatList

What’s the right way to get the bounding boxes transformed properly?

Fastai has item lists made specifically to handle bounding boxes likes ObjectItemList ot ObjectCategoryList. Does it work better with one of them ?

Doing:

data = (ObjectItemList.from_csv(path='.', 
                         csv_name='dataset.csv', 
                         folder='.',
                         )
        .split_none()
        .label_from_df(cols=[ 'x0', 'y0', 'x1', 'y1'], label_cls=FloatList)
        .transform(None, resize_method = ResizeMethod.SQUISH, size=224, tfm_y=True)
        .databunch(bs=64) )

Still produces the same bounding boxes, without modification from the original ones. I’m using None in the tfms because I don’t want any modification but the squish, but if I change None by get_transforms() I’m getting:

It's not possible to apply those transforms to your dataset:
Not implemented: you can't apply transforms to this type of item (FloatItem)

ObjectCategoryList is not solving the problem either.

That’s because it need label_cls to be ObjectCategoryList (which is the default class for object detection). This is a list that contains objects of type ImageBBox, which can handle transformations. You can probably extend the class to your use case where you seem to have only one class, with something like:

class ObjectList(ObjectCategoryList):
    "`ItemList` for labelled bounding boxes."
    _processor = ObjectCategoryProcessor

    def get(self, i):
        # I imagine self.items[i] returns only one bounding box here, as the list of columns['x0', 'y0', 'x1', 'y1']
        return ImageBBox.create(*_get_size(self.x,i), self.items[i], labels=[self.classes[0]], classes=self.classes, pad_idx=self.pad_idx)

You can then create your list:

data = (ObjectItemList.from_csv(path='.', 
                         csv_name='dataset.csv', 
                         folder='.',
                         )
        .split_none()
        .label_from_df(cols=[ 'x0', 'y0', 'x1', 'y1'], label_cls=ObjectList, classes=['whatever'])
        .transform(None, resize_method = ResizeMethod.SQUISH, size=224, tfm_y=True)
        .databunch(bs=64) )

Does this work ? Another option is to use a custom function to get the bounding boxes that returns, when given the path to your image, a list of all bounding boxes (as lists of 4 values) contained in the image and a list of the corresponding labels (that may be all the same). Then you can simply use the basic ObjectCategoryList by loading the labels using get_y_func. (there is a sample use case in the docs).

Thanks for the answer @florobax and sorry for the delay. The code provided is not working for me. I’m getting an 'int' object is not iterable when trying to label the data.

Using a function to label the data only works if I use FloatList as label_cls. The following code works properly but the bounding boxes are not transformed:

img2bb = { i[0]:i[1] for i in imgs }
get_y_func = lambda x: [img2bb[x[4:]]]
data = ObjectItemList.from_csv(path='.', 
                         csv_name='dataset.csv',
                         folder='.'
                         )
data = data.split_none()
data = data.label_from_func(get_y_func, label_cls=FloatList)
data = data.transform(None, resize_method = ResizeMethod.SQUISH, size=224, tfm_y=True)
data = data.databunch(bs=64)

If I replace the line:
data = data.label_from_func(get_y_func, label_cls=FloatList)

by

data = data.label_from_func(get_y_func, label_cls=ObjectCategoryList)

I get the error: IndexError: index 1 is out of bounds for axis 0 with size 1 in the line:
data = data.label_from_func(get_y_func, label_cls=ObjectCategoryList). This error is the same I’m getting when using the custom class.

My ultimate goal is to do a bounding box regressor and also a class prediction, so I don’t need to use a random class, I have a column with the class as well. I’ve tried setting them as well with no luck.

Any other idea? I think I’ll resize manually all images and bounding boxes and avoid the data augmentation step.

I think you can stick to label_from_df, I just made a mistake on ObjectList, the get function should actually be:

    def get(self, i):
        # I imagine self.items[i] returns only one bounding box here, as the list of columns['x0', 'y0', 'x1', 'y1']
        return ImageBBox.create(*_get_size(self.x,i), [self.items[i]], labels=[self.classes[0]], classes=self.classes, pad_idx=self.pad_idx)

If at some point you want to do multiclass, if there is a label column in your csv and only one bounding box per image (if there are multiple it is also possible to do it but I’ll need to know the way your csv is made), and change the whole thing to:

class ObjectList(ObjectCategoryList):
    "`ItemList` for labelled bounding boxes."
    _processor = ObjectCategoryProcessor

    def get(self, i):
        # I imagine self.items[i] returns only one bounding box here, as the list of columns + label['x0', 'y0', 'x1', 'y1', 'label']
        bb = self.items[i]
        return ImageBBox.create(*_get_size(self.x,i), [bb[:-1]], labels=[bb[-1]], classes=self.classes, pad_idx=self.pad_idx)

data = (ObjectItemList.from_csv(path='.', 
                         csv_name='dataset.csv', 
                         folder='.',
                         )
        .split_none()
        .label_from_df(cols=[ 'x0', 'y0', 'x1', 'y1', 'label'], label_cls=ObjectList, classes=['whatever'])
        .transform(None, resize_method = ResizeMethod.SQUISH, size=224, tfm_y=True)
        .databunch(bs=64) )

I’m pretty sure something along these lines should work. If it doesn’t, I’ll gladly help but I might need a notebook of yours or something similar to have a better idea of what’s going on.

1 Like

Thanks once again @florobax. I’ve tried that and I’m still having the same problems (basically getting an 'int' object is not iterable. I’ve created a minimal example with images and uploaded to this notebook. If you can take I look, would be great.

Once again, thanks for all your help :slight_smile:

I think I’m starting to understand. ObjectCategoryList expect each item to be a collection of 2 elements:

  • the first one is a list of all bounding boxes in the image (each bb being an array of 4 ints)
  • the second one is a list of the corresponding labels
    So for this case where there is only one bb in each image, this should work:
class ObjectList(ObjectCategoryList):
    "`ItemList` for labelled bounding boxes."
    _processor = ObjectCategoryProcessor
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.items = [[[item[:-1]], [item[-1]]] for item in self.items]

Still no luck. I think the bounding box needs to be a different type (probably a FloatList) in order to accept the transformation since now I’m getting the error:

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, and uint8.

I think I’ll resize the images manually along with the bounding boxes.

No it needs to be an ObjectCategoryList, I can assure you. The problem here is that pandas imports the element as object type, why it needs to befloat to be converted to tensor. Something you can do is:

class ObjectList(ObjectCategoryList):
    "`ItemList` for labelled bounding boxes."
    _processor = ObjectCategoryProcessor
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.items = [[[item[:-1].astype('float')], [item[-1]]] for item in self.items]

If that doesn’t work, you can also try:

class ObjectList(ObjectCategoryList):
    "`ItemList` for labelled bounding boxes."
    _processor = ObjectCategoryProcessor
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.items = [[[[float(i) for i in item[:-1]]], [item[-1]]] for item in self.items]

This is my guess to make it work, but if it doesn’t, I’d like to see the full error stack trace so I can see where the error precisely comes from.

That worked! Specifying astype(‘float’) did the trick.

Thanks for all your help @florobax! I appreciated a lot.

1 Like

No problem !

1 Like