I’m trying to redo lesson 8 in the 2018 version of the course, and it has a part where jeremy predicts the bounding boxes only using a regression of 4 numbers. This is achieved with:
In fastai v1 I can’t find a way to do regression with multiple targets. When using label_cls = FloatList it only treats the target as a single floar per item.
Am I missing something? How can I use a single learner/model to predict a simple regression of 4 numbers?
I am exactly now doing the same lesson and encounter the same issue!
I tried to create data in this way, to adapt it to Fast.ai 1.0 library:
data = (ImageList.from_csv(path=path, folder=‘train’, csv_name=‘tmp/bb.csv’)
.split_by_rand_pct()
.label_from_df(cols=‘bbox’, label_delim=’ ')
.transform(get_transforms(), resize_method=ResizeMethod.SQUISH, size=sz)
.databunch().normalize(imagenet_stats)
)
When I look on my data after that: data.train_ds.y
I get it as: MultiCategoryList (2001 items)
I am not sure that it is a correct format to make multiple label regression for the Learner.
Then I created a Learner in the next way:
head_reg4 = nn.Sequential(Flatten(), nn.Linear(25088, 4))
learn = cnn_learner(data, models.resnet34, custom_head = head_reg4, loss_func = nn.L1Loss(), metrics=error_rate)
But when I try to do a training I am getting the next error: The size of tensor a (4) must match the size of tensor b (500) at non-singleton dimension 1
If you succeed to find a way to advance please let me know the answer, here.
Thanks!
I also saw this notebook. But I would like first to reimplement Jeremy’s intermediate step that he did using Fast.ai 0.7 in lesson 9 to calculate single object bounding box using regular Resnet model.
I think it should be a matter of calling right DataBunch creation methods with right parameters. I am trying to figure out how to do that.
Thanks! I will check it. The difference that now I have all 4 numbers separated by a space in a single column and also they are integers not floats. And it seems that my DataBunch creation does not work properly.
It seems to me that it tries to look on them as one hot encoding Tensors of size(500).
Not what I want in this case. And subsequently training fails because of a wrong size of labels.
But actually after training there is still something wrong.
When I am trying to call a function: learn.show_results()
it just shows “column_names” above the images and not drawing anything on them.
And also another topic. How can I show my data as images?
In this lesson Jeremy uses a command:
md.val_ds.denorm(to_np(x))[0]
but there are no anymore such functions in Fast.ai 1.0
So how to transform X data to an image that I can show on the screen?
In a v1 framework, I have not found a straightforward way to predict multiple targets unless the data match the head pose dataset. It would be awesome to have an example case that is not coordinate-based, but multitasking. An example would be predicting dog height and weight from an image. Or a mixed-data type case where you predict class and a continuous variable, eg, breed and weight.
I succeeded to make it work using PointsItemList class and the data creation like this having each predicted label in a separate column. First column in the table keeps a list of image file names.
I still see a problem after all my fights to make this notebook work with Fast.ai 1.0.
The coordinates of bounding boxes were given for training within dimensions of the original picture, but trained on a picture of a size (224x224).
I do not see a transformation of bounding box coordinates happening in any place.
@ilovescience
And also I can not use .transform(get_transforms(), tfm_y=True, size=sz) . (tfm_y=True)
it gives me an error:
Exception: It’s not possible to apply those transforms to your dataset:
Not implemented: you can’t apply transforms to this type of item (MultiCategory)
Only problem is that I don’t think it would do the proper transforms on the values, so you probably couldn’t do any rotation transforms.
The alternative would be to use PointsItemList but have the columns be the four points, rather than just two points and width and height. Maybe because of passing just two points and height and width, it is getting confused and is not recognizing it as Image Points?
Thanks for trying to help.
I still did not succeed to make this notebook to work properly.
I tried to use PointsItemList class and then either to use .label_from_df() or .label_from_func()
although in last case I am not sure what exactly type of data I should return from this function.
I do not use width and height, but coordinates of a second corner.
So far I did not succeed to create a data bunch with ‘tfm_y=True’ option in either way.
Without it, it seems that I can make the system train, but because of smaller size of images for training the coordinates of bounding boxes will be wrong by definition.
I switched now to NLP topics, but I will try to return and see if I can still make this work a little but later.
I will try your recommendations too.
If you meanwhile discover some more useful info please let me know.
Thanks for puzzling through this! Maybe it’s obvious to others, but I’ve found that the data source need not be a PointsItemList. For example, ImageList also works:
While this is true for most image regression problems, it is not as helpful for bounding box regression/object detection. This is because the PointsItemList supports transformations for both the image and the points, so the image can be resized, rotated, etc. with the points also transformed. However, the above discussion showed that we were having some difficulties doing so.
If the label value does not change if the image is transformed, then it is fine doing the approach you outlined.