Multiple regression for bbox only in fastai v1

Thanks @ilovescience.
Using your idea of storing 4 label coordinates in 4 separate columns in .csv it finally worked for me with PointsItemList class.

Slowly, slowly I am getting used to Python and Fast.ai library :slight_smile:

1 Like

But actually after training there is still something wrong.
When I am trying to call a function: learn.show_results()
it just shows “column_names” above the images and not drawing anything on them.

And also another topic. How can I show my data as images?
In this lesson Jeremy uses a command:

md.val_ds.denorm(to_np(x))[0]

but there are no anymore such functions in Fast.ai 1.0
So how to transform X data to an image that I can show on the screen?

Here is another thread discussing this: How to predict multiple regressions from a model?

In a v1 framework, I have not found a straightforward way to predict multiple targets unless the data match the head pose dataset. It would be awesome to have an example case that is not coordinate-based, but multitasking. An example would be predicting dog height and weight from an image. Or a mixed-data type case where you predict class and a continuous variable, eg, breed and weight.

1 Like

I succeeded to make it work using PointsItemList class and the data creation like this having each predicted label in a separate column. First column in the table keeps a list of image file names.

data = (PointsItemList.from_csv(path=path, folder='train', csv_name='tmp/bb.csv')
        .split_by_rand_pct()
        .label_from_df(cols=['b1','b2','b3','b4'])
        .transform(get_transforms(), size=sz)
        .databunch(bs=bs).normalize(imagenet_stats)
       ) 
2 Likes

I still see a problem after all my fights to make this notebook work with Fast.ai 1.0.
The coordinates of bounding boxes were given for training within dimensions of the original picture, but trained on a picture of a size (224x224).
I do not see a transformation of bounding box coordinates happening in any place.

@ilovescience
And also I can not use .transform(get_transforms(), tfm_y=True, size=sz) . (tfm_y=True)
it gives me an error:
Exception: It’s not possible to apply those transforms to your dataset:
Not implemented: you can’t apply transforms to this type of item (MultiCategory)

Any idea how to overcome this?

1 Like

I am confused here as to what is the error. Is the problem because of the tfm_y?

Also, the transform function should transform the values to compensate for the different image size if tfm_y=True

I found this:

So try this:

data = (ImageItemList.from_csv(path=path, folder=‘train’, csv_name=‘tmp/bb.csv’)
.split_by_rand_pct()
.label_from_df(cols=[‘b1’,‘b2’,‘b3’,‘b4’],label_cls=FloatList)
.transform(get_transforms(), size=sz)
.databunch(bs=bs).normalize(imagenet_stats)
)

Only problem is that I don’t think it would do the proper transforms on the values, so you probably couldn’t do any rotation transforms.

The alternative would be to use PointsItemList but have the columns be the four points, rather than just two points and width and height. Maybe because of passing just two points and height and width, it is getting confused and is not recognizing it as Image Points?

1 Like

Thanks for trying to help.
I still did not succeed to make this notebook to work properly.
I tried to use PointsItemList class and then either to use .label_from_df() or .label_from_func()
although in last case I am not sure what exactly type of data I should return from this function.

I do not use width and height, but coordinates of a second corner.

So far I did not succeed to create a data bunch with ‘tfm_y=True’ option in either way.
Without it, it seems that I can make the system train, but because of smaller size of images for training the coordinates of bounding boxes will be wrong by definition.

I switched now to NLP topics, but I will try to return and see if I can still make this work a little but later.
I will try your recommendations too.
If you meanwhile discover some more useful info please let me know.

Thanks for puzzling through this! Maybe it’s obvious to others, but I’ve found that the data source need not be a PointsItemList. For example, ImageList also works:

data = (ImageList.from_df(df = df, path = path)
 .split_by_idx(valIdxs)
 .label_from_df(label_cls=FloatList, log=False, cols=allSp)
 .transform(tfms, size=64)
 .databunch(bs=32))

The key seems to be specifying the label columns with the cols argument.

1 Like

While this is true for most image regression problems, it is not as helpful for bounding box regression/object detection. This is because the PointsItemList supports transformations for both the image and the points, so the image can be resized, rotated, etc. with the points also transformed. However, the above discussion showed that we were having some difficulties doing so.

If the label value does not change if the image is transformed, then it is fine doing the approach you outlined.

1 Like

Ah, yes. I was fixated on my application which was not bounding box related, apologies. This thread has been helpful for other problems.

I did this using ObjectItemList -> Bounding box regression using resnet34 in fastai v1

1 Like

Pretty much in the same place. Did you manage to find a solution to this? I understand that the goal is to communicate two things as we create the DataBunch

db_with_coord = (ImageList.from_df(dbox_df, JPEGS_PATH)
               .split_by_rand_pct(valid_pct=0.2)
               .label_from_df(label_cls=FloatList, cols=['b1', 'b2', 'b3', 'b4'])
               .transform(get_transforms(do_flip=1, max_rotate=30, max_lighting=0.1),
                          size=224, tfm_y=True)
               .databunch(bs=16)
     );
  1. That it’s a regression problem on 4 columns which we’re doing by setting label_cls to FloatList in the .label_from_df call
  2. Ensuring the bounding box co-ordinates are updated with the transformations by adding the tfm_y to True in the transform call.
Exception: It's not possible to apply those transforms to your dataset:
 Not implemented: you can't apply transforms to this type of item (FloatItem)

The fact that the error says that this cannot be applied to a FloatItem seems to be because of an incorrect use of ImageList on my part but it seems intuitive considering we’re applying image transformations.

Using a PointsItemList as suggested by some of the comments returns a negative training and validation loss for me.

Didn’t you find the solution to the problem of transforming the bb coordinates?? I’m now exactly stuck at this problem

I think I finally found the solution. I used the technique implemented in head_pose notebook of the course. The idea is to give the model the coordinates of the top left and bottom right points and let it learn! After training, you can obtain the coordinates and have the rectangles as before.

data = (PointsItemList.from_csv(PATH, “tmp/bb.csv”, folder=“JPEGImages”)
.split_by_rand_pct()
.label_from_func(label)
.transform(tfm_y=True, size=(224,224))
.databunch().normalize(imagenet_stats))

This is the API I used and it transforms the coordinate as well when resizing the image.
Let me explain it. in “bb.csv” , the first column is the images name and the second is a string containing 4 coordinates relating to top left and bottom right points of the bounding box.
“label” is a function which receives the file names and obtain the 4 coords by searching a df which maps image ids to coords and returns a tensor containing y,x of the to points (so the shape will be 2 by 2) .
def label(o):
bb = df[df[“fn”] == str(o).split("/")[-1]].bbox.values[0]
coord = [float(i) for i in bb.split()]
return tensor([coord[0], coord[1]], [coord[2], coord[3]])

When you got your data, it is almost straightforward and you make a cnn_learner without needing to use a custom_head.
The model of course returns 4 coords which are in the (-1, 1) scale which converting it to 224 by 224 is simple math.

I think it’s doing a great job! Let’s look at an example

BTW, if my explanation is not clear, tell me to clarify.

Hey,
This works pretty awesome. Thanks
If by chance anyone is still implementing L8 of 2018, here is a screenshot of the suggested solution

Happy that it was helpful; it took one whole day from me :slight_smile:

can you please help me understand the exp[0].data.permute(1,2,0)

thats because in PyTorch and fastai the channel dim is the first dimension in an image but matplotlib wants it to be the last one

Thanks for the help.