Multiple regression for bbox only in fastai v1

I’m trying to redo lesson 8 in the 2018 version of the course, and it has a part where jeremy predicts the bounding boxes only using a regression of 4 numbers. This is achieved with:

md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True, bs=4)

In fastai v1 I can’t find a way to do regression with multiple targets. When using label_cls = FloatList it only treats the target as a single floar per item.

Am I missing something? How can I use a single learner/model to predict a simple regression of 4 numbers?

1 Like


I am exactly now doing the same lesson and encounter the same issue!

I tried to create data in this way, to adapt it to 1.0 library:
data = (ImageList.from_csv(path=path, folder=‘train’, csv_name=‘tmp/bb.csv’)
.label_from_df(cols=‘bbox’, label_delim=’ ')
.transform(get_transforms(), resize_method=ResizeMethod.SQUISH, size=sz)

When I look on my data after that: data.train_ds.y
I get it as: MultiCategoryList (2001 items)

I am not sure that it is a correct format to make multiple label regression for the Learner.

Then I created a Learner in the next way:
head_reg4 = nn.Sequential(Flatten(), nn.Linear(25088, 4))
learn = cnn_learner(data, models.resnet34, custom_head = head_reg4, loss_func = nn.L1Loss(), metrics=error_rate)

But when I try to do a training I am getting the next error:
The size of tensor a (4) must match the size of tensor b (500) at non-singleton dimension 1

If you succeed to find a way to advance please let me know the answer, here.

Image Regression was covered in Lesson3. I would recommend you watching that video to get started.

I watched this lesson, but there is a single point and also different format of data.
How do you convert it to multi point regression?

Can you please write the code how you create a DataBunch for this case of Lesson 9?

1 Like

This may be cheating, but here is a notebook for object detection that uses the ObjectItemList and a RetinaNet model:

I also saw this notebook. But I would like first to reimplement Jeremy’s intermediate step that he did using 0.7 in lesson 9 to calculate single object bounding box using regular Resnet model.
I think it should be a matter of calling right DataBunch creation methods with right parameters. I am trying to figure out how to do that.

You should be able to use PointsItemList where there are multiple floats as labels…

For example, if there is a csv with the image filename in one column, and columns for the height, width, x, y, then the code might be:

data = (PointsItemList.from_csv(path)
        .transform(get_transforms(), tfm_y=True, size=128)

Note, I have never done this, but I see mentions in the docs for support of multiple labels/floats for regression


Thanks! I will check it. The difference that now I have all 4 numbers separated by a space in a single column and also they are integers not floats. And it seems that my DataBunch creation does not work properly.

.label_from_df has a label_delim argument. If they are separated by a space you can set label_delim=' '

But when I call it like this:

data = (PointsItemList.from_csv(path=path, folder='train', csv_name='tmp/bb.csv')
        .label_from_df(cols='bbox', label_delim=' ')
        .transform(get_transforms(), resize_method=ResizeMethod.SQUISH, size=sz)

I am getting an error: got an unexpected keyword argument ‘label_delim’

When I do the same call with ImageList instead of PointsItemList:

data = (ImageList.from_csv(path=path, folder='train', csv_name='tmp/bb.csv')
        .label_from_df(cols='bbox', label_delim=' ')
        .transform(get_transforms(), resize_method=ResizeMethod.SQUISH, size=sz)

I am not getting an error, but resulting created y labels seem to be wrong.
For example: data.train_ds.y returns me:

*MultiCategoryList (2001 items)*

But if I do the commands:

x,y = next(iter(data.train_dl))

It returns me:

torch.Size([5, 500])

It seems to me that it tries to look on them as one hot encoding Tensors of size(500).
Not what I want in this case. And subsequently training fails because of a wrong size of labels.

Thanks @ilovescience.
Using your idea of storing 4 label coordinates in 4 separate columns in .csv it finally worked for me with PointsItemList class.

Slowly, slowly I am getting used to Python and library :slight_smile:

1 Like

But actually after training there is still something wrong.
When I am trying to call a function: learn.show_results()
it just shows “column_names” above the images and not drawing anything on them.

And also another topic. How can I show my data as images?
In this lesson Jeremy uses a command:


but there are no anymore such functions in 1.0
So how to transform X data to an image that I can show on the screen?

Here is another thread discussing this: How to predict multiple regressions from a model?

In a v1 framework, I have not found a straightforward way to predict multiple targets unless the data match the head pose dataset. It would be awesome to have an example case that is not coordinate-based, but multitasking. An example would be predicting dog height and weight from an image. Or a mixed-data type case where you predict class and a continuous variable, eg, breed and weight.

1 Like

I succeeded to make it work using PointsItemList class and the data creation like this having each predicted label in a separate column. First column in the table keeps a list of image file names.

data = (PointsItemList.from_csv(path=path, folder='train', csv_name='tmp/bb.csv')
        .transform(get_transforms(), size=sz)

I still see a problem after all my fights to make this notebook work with 1.0.
The coordinates of bounding boxes were given for training within dimensions of the original picture, but trained on a picture of a size (224x224).
I do not see a transformation of bounding box coordinates happening in any place.

And also I can not use .transform(get_transforms(), tfm_y=True, size=sz) . (tfm_y=True)
it gives me an error:
Exception: It’s not possible to apply those transforms to your dataset:
Not implemented: you can’t apply transforms to this type of item (MultiCategory)

Any idea how to overcome this?

1 Like

I am confused here as to what is the error. Is the problem because of the tfm_y?

Also, the transform function should transform the values to compensate for the different image size if tfm_y=True

I found this:

So try this:

data = (ImageItemList.from_csv(path=path, folder=‘train’, csv_name=‘tmp/bb.csv’)
.transform(get_transforms(), size=sz)

Only problem is that I don’t think it would do the proper transforms on the values, so you probably couldn’t do any rotation transforms.

The alternative would be to use PointsItemList but have the columns be the four points, rather than just two points and width and height. Maybe because of passing just two points and height and width, it is getting confused and is not recognizing it as Image Points?

1 Like

Thanks for trying to help.
I still did not succeed to make this notebook to work properly.
I tried to use PointsItemList class and then either to use .label_from_df() or .label_from_func()
although in last case I am not sure what exactly type of data I should return from this function.

I do not use width and height, but coordinates of a second corner.

So far I did not succeed to create a data bunch with ‘tfm_y=True’ option in either way.
Without it, it seems that I can make the system train, but because of smaller size of images for training the coordinates of bounding boxes will be wrong by definition.

I switched now to NLP topics, but I will try to return and see if I can still make this work a little but later.
I will try your recommendations too.
If you meanwhile discover some more useful info please let me know.

Thanks for puzzling through this! Maybe it’s obvious to others, but I’ve found that the data source need not be a PointsItemList. For example, ImageList also works:

data = (ImageList.from_df(df = df, path = path)
 .label_from_df(label_cls=FloatList, log=False, cols=allSp)
 .transform(tfms, size=64)

The key seems to be specifying the label columns with the cols argument.

1 Like

While this is true for most image regression problems, it is not as helpful for bounding box regression/object detection. This is because the PointsItemList supports transformations for both the image and the points, so the image can be resized, rotated, etc. with the points also transformed. However, the above discussion showed that we were having some difficulties doing so.

If the label value does not change if the image is transformed, then it is fine doing the approach you outlined.

1 Like