Object detection in fast.ai v1

I am playing around with the object detection and wondering how to do that in the fast.ai V1. Are there notebook in the fastai_docs that I can look at ? I found in the vision.models there is Yolo3

Darknet architecture, which is the base of Yolo v3

So in the new version of fast.ai course, we will use yolo3 to detect object rather than SSD in the last version ? How can I get information about that (notebook used for developing yolo3) ?

Thank you in advance,


No, we’ll use retina net. The dev notebook is here.


For the COCO dataset use in the 102a_coco how do we get it ? is it right i download from fast.ai subset in https://course.fast.ai/datasets.html ?

Thank you,

[Edit]: I use the data above and it seams ok. However I ran into a problem at

x,y = train_tds[0]
x.show(y=y, classes=train_ds.classes, figsize=(6,4))

~/fastai/fastai/vision/transform.py in flip_lr(x)
     60 @TfmPixel
---> 61 def flip_lr(x): return x.flip(2)
     63 @TfmAffine

AttributeError: 'ImageBBox' object has no attribute 'flip'

I am using fast.ai 1.0.20

You should use flip_affine for bboxes or points, flip_lr isn’t implemented for them.


Thanks for the RetinaNet implementation. I was wondering if you could provide a link to the .pth files referenced in the notebook.

clas = torch.load(PATH/‘models’/‘tst_clas.pth’)
regr = torch.load(PATH/‘models’/‘tst_regr.pth’)

Those were just randomly generated models, I just wanted to reload the exact same ones for testing.

Did you make Retinanet work?

I was successful with creating the data object using the datablocks api … but now I am strugelling with creating the model

model = RetinaNet(encoder, 6, -4)
ValueError Traceback (most recent call last)
1 encoder = create_body(tvm.resnet50(True), -2)
----> 2 model = RetinaNet(encoder, 6, -4)
3 model.eval()
4 x = torch.randn(2,3,256,256)
5 output = model(x)

<ipython-input-40-37554ab72d4b> in __init__(self, encoder, n_classes, final_bias, chs, n_anchors, flatten)
      6         self.n_classes,self.flatten = n_classes,flatten
      7         imsize = (256,256)
----> 8         sfs_szs,x,hooks = model_sizes(encoder, size=imsize)
      9         sfs_idxs = _get_sfs_idxs(sfs_szs)
     10         self.encoder = encoder

ValueError: too many values to unpack (expected 3)

I haven’t made it work yet. I’m still stuck with creating the ‘clas’ and ‘regr’ variables. Do you know how to create them?

I had the same error as you and replaced encoder:Model with encoder:nn.Module:

class RetinaNet(nn.Module):
“Implements RetinaNet from https://arxiv.org/abs/1708.02002
def init(self, encoder:Model, n_classes, final_bias=0., chs=256, n_anchors=9, flatten=True):

to this:

class RetinaNet(nn.Module):
“Implements RetinaNet from https://arxiv.org/abs/1708.02002
def init(self, encoder:nn.Module, n_classes, final_bias=0., chs=256, n_anchors=9, flatten=True):

Not sure if this is correct, though.

@m000ritz, @hkristen Can you make Object Detection work with fast.ai v1 ? There are no more ObjectDetectDataset in fastai library . Are there any example of this in fast. ai v1, no matter Retina or the previous used technique SSD ?

Thank you

1 Like

Hi @dhoa!

For the fastai coco subset I was able to create the data object with the datablocks api like this:

PATH = Path('.../coco_sample')
ANNOT_PATH = 'annotations'
images, lbl_bbox = get_annotations(PATH / ANNOT_PATH / 'train_sample.json')
img2bbox = dict(zip(images, lbl_bbox))
get_y_func = lambda o:img2bbox[o.name]
data = (ObjectItemList.from_folder(PATH / 'train_sample')
        #Where are the images?
        #How to split in train/valid? -> randomly with the default 20% in valid
        #How to find the labels? -> use get_y_func
        .transform(get_transforms(), tfm_y=True, padding_mode='zeros', do_crop=False, size=128,)
        #Data augmentation? -> Standard transforms with tfm_y=True
        .databunch(bs=16, collate_fn=bb_pad_collate))   
        #Finally we convert to a DataBunch and we use bb_pad_collate

Then I am stuck at the above described error. Ithink I`ll need more time or maybe help from @sgugger to resolve this :wink:

If you make any progress, please let us know!


Since the creation of the notebook model_size changed its behaviour. You can now use:
sfs_szs = model_sizes(encoder, size=imsize)
hooks = hook_outputs(encoder)
instead. That should work for you.

1 Like

Thanks @msandroid! I am one step further now, again running into an error that seems to come from changes in teh fastai source code…

For anybody working on this, you can use my GIST as a start …


Thanks @hkristen . So I think play with object detection in fast.ai now is quite difficult because it misses example and the library is still changing. I switch to read last year course with SSD and hope Jeremy will tell something about it in the last session of part 1 v3

No, it won’t be covered until part 2.


I think the conv2d_relu can be replaced by conv_layer. Or at least, that’s what I did, and it solved the issue.

I now have a notebook that runs until the learning part (included), but unfortunately, the loss function doesn’t go down when I’m trying to learn :frowning:

One more thing that I had to change in order to make the notebook run was to replace in the FocalLoss definition:

clas_loss = F.binary_cross_entropy_with_logits(clas_pred, encoded_tgt, weights, reduction=‘sum’)


weights_no_grad = weights.detach()
clas_loss = F.binary_cross_entropy_with_logits(clas_pred, encoded_tgt, weights_no_grad, reduction=‘sum’)

Otherwise, Pytorch was trying to compute the gradient of the weights and sending an error…

It’s frustrating, I feel like we’re close to making it work !!


I found a working object detection algorithm y’all will find interesting.


@divyansh and I have gotten object detection to work with fastai v1. The dev notebook is at https://github.com/rohitgeo/singleshotdetector

This creates a Dynamic SSD based on the number of grid cells, zoom levels and aspect ratios for the anchor boxes.

We haven’t trained the network fully in the dev notebook… like unfreezing and fine-tuning, and using Focal Loss (which is supported) etc, but are getting good results already. Hope you find it useful!

See Dynamic SSD implementation for fastai v1 for more details.


So the code in fastai v1 for object detection will not be ready until part 2? I see already in the docs the yolov3 architecture… It is not yet 100% ready?

1 Like

There is no YoloV3 in the fastai library and what’s in preparation is Retina Net, but no, it won’t be ready before part 2.


I think that what you saw is that darknet is available in the fastai CV model zoo. YOLOv3 uses darknet so that’s why YOLOv3 is mentioned.