Object detection in fast.ai v1

(Dien Hoa TRUONG) #1

I am playing around with the object detection and wondering how to do that in the fast.ai V1. Are there notebook in the fastai_docs that I can look at ? I found in the vision.models there is Yolo3

Darknet architecture, which is the base of Yolo v3

So in the new version of fast.ai course, we will use yolo3 to detect object rather than SSD in the last version ? How can I get information about that (notebook used for developing yolo3) ?

Thank you in advance,


No, we’ll use retina net. The dev notebook is here.

(Dien Hoa TRUONG) #3

For the COCO dataset use in the 102a_coco how do we get it ? is it right i download from fast.ai subset in https://course.fast.ai/datasets.html ?

Thank you,

[Edit]: I use the data above and it seams ok. However I ran into a problem at

x,y = train_tds[0]
x.show(y=y, classes=train_ds.classes, figsize=(6,4))

~/fastai/fastai/vision/transform.py in flip_lr(x)
     60 @TfmPixel
---> 61 def flip_lr(x): return x.flip(2)
     63 @TfmAffine

AttributeError: 'ImageBBox' object has no attribute 'flip'

I am using fast.ai 1.0.20


You should use flip_affine for bboxes or points, flip_lr isn’t implemented for them.


Thanks for the RetinaNet implementation. I was wondering if you could provide a link to the .pth files referenced in the notebook.

clas = torch.load(PATH/‘models’/‘tst_clas.pth’)
regr = torch.load(PATH/‘models’/‘tst_regr.pth’)


Those were just randomly generated models, I just wanted to reload the exact same ones for testing.


Did you make Retinanet work?

I was successful with creating the data object using the datablocks api … but now I am strugelling with creating the model

model = RetinaNet(encoder, 6, -4)
ValueError Traceback (most recent call last)
1 encoder = create_body(tvm.resnet50(True), -2)
----> 2 model = RetinaNet(encoder, 6, -4)
3 model.eval()
4 x = torch.randn(2,3,256,256)
5 output = model(x)

<ipython-input-40-37554ab72d4b> in __init__(self, encoder, n_classes, final_bias, chs, n_anchors, flatten)
      6         self.n_classes,self.flatten = n_classes,flatten
      7         imsize = (256,256)
----> 8         sfs_szs,x,hooks = model_sizes(encoder, size=imsize)
      9         sfs_idxs = _get_sfs_idxs(sfs_szs)
     10         self.encoder = encoder

ValueError: too many values to unpack (expected 3)


I haven’t made it work yet. I’m still stuck with creating the ‘clas’ and ‘regr’ variables. Do you know how to create them?

I had the same error as you and replaced encoder:Model with encoder:nn.Module:

class RetinaNet(nn.Module):
“Implements RetinaNet from https://arxiv.org/abs/1708.02002
def init(self, encoder:Model, n_classes, final_bias=0., chs=256, n_anchors=9, flatten=True):

to this:

class RetinaNet(nn.Module):
“Implements RetinaNet from https://arxiv.org/abs/1708.02002
def init(self, encoder:nn.Module, n_classes, final_bias=0., chs=256, n_anchors=9, flatten=True):

Not sure if this is correct, though.