Multi Object Detection in Fastai2 / Colab

Not sure if this is the right place to ask. I am trying to build a very simple model that works with my custom data set. I got my data to look like the COCO image dataset and I am testing on COCO_tiny to see if things work.

I am struggling with were the RetinaNet or SSD models with custom heads has moved to with Fastai2 (version 2.5.6 is what I see on Colab right now…)

My data is like this:

takeoff = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
## first part is the boundary 4-values, second is the label
get_y = [lambda o: img2bbox[o.name][0], lambda o: img2bbox[o.name][1]],
batch_tfms=aug_transforms(),
n_inp=1)

So I want a head that will predict the 4-values for the bbox and the category. Seems simple enough to build a resnet backbone and then the correct head. From re-watching the older videos, it seems that SSD or RetinaNet should have this as a standard output. And yet, I can’t find a code snippet that does this anywhere in the basic library for Fastai2.

I have found in the wwf (thank you @muellerzr !) library some tools that seem to get close to this with fitting a RetinaNet with FocalLoss. But event that seems to have a few steps to get things working.

I would guess that vision_learner would build this correctly, but I don’t get the right output head.

I found some posts about IceVision, and that seems to indicate Fastai, but I can’t figure out how to connect to that and use it.

Any hints on a full working solution (with some COCO_tiny data, for example…) that can train the head to predict the bounding box + category?

Any hints/helps/repos/scripts that get most of the way there would be extremely helpful.

Hi there,

A couple of weeks back I also tried to the exact same thing, but didn’t succeed. I found kost of the code from wwf is coming from an older fastai repo (which I think was still using fastai 1). I also tried to use icevision, but found it not very well documented, and it also was using an older torch version which didn’t let paperspace a4000 you to be used. Eventually I switched to using yolo (V5) which is working like a charm. It’s not super satisfying, since I have jo idea what’s going on under the hood, but it gets the job done

Thanks for a quick reply!
Can you share a code snippet with a simple example from Yolo v5 that works?

Can you share a code snippet with a simple example from Yolo v5 that works?
Just have a look at the docs, it’s pretty well documented.

Also, I just came across this post: dhblog - Object Detection from scratch - Single Shot Detector which looks pretty interesting!

3 Likes