Working notebook for Object Detection

Hi everyone,
I would like to have a working notebook with Object Detection.
I am building one but I found many issues both on reading data and on running fit on it.
Since I had these problems I read many posts both on the forum and outside… And I tried to execute the only official example I found: here which uses COCO dataset. Unfortunately it only covers databunch creation but doesn’t go on with model creation and execution. In fact, if I try to run the example but I have the same issues as on my notebook.
I think the issues rises because the metrics or the loss function are not appropriate…
I tried to follow this notebook which I found very interesting, but I am not able to get it working neither on COCO dataset nor on mine.
I both tried with only one BBox or multi-bbox for each image, both with long/float coords… but I still have this error:

The size of tensor a must match the size of tensor b at non-singleton dimension…

Can someone share some notes on how to have a model running on an object detection dataset?


Hi ! Object Detection is not covered in part 1, so I think it would be best to put this in the “advanced” category.

See this thread about Object Detection. It’s using Sylvain’s implementation of RetinaNet here. I’m also working on it, and I should have a working notebook that I’ll feel confident enough to share by the end of the week if all goes well.


Hi @PierreO ,
thank you for your reply.
I tried to execute Sylvain notebook but it seems to be currently a work in progress since it misses the data untar classic instruction and it also gives immediately a problem ‘name ‘ObjectDetectDataset’ is not defined’.
So currently I think we should find another way.
Just to be more explicit:
I am currently working on an application to detect people faces with bounding boxes. I know that with multiple bounding boxes and classes the task becomes more difficult… So at the moment I am simplifying the task as a regression problem: I only need to find the 2 points that identify the bounding box.
I already built the dataset and I can correctly run show_batch on it, but I encounter different types of errors when I try to create a learner (may be because of the loss function, the metrics or the architecture used?)
Do you have any suggestion on how to make up a learner for this situation?

can someone help me with a reference to an example?

thank you,

Yes, the notebook I linked to was made by Sylvain a few months ago and since then the library has changed so you’ll have to change and fix things.
If you want a turnkey solution you’ll have to wait or make your own stuff. You can also use other implementation, not using fastai or even PyTorch. See here for example.

Lastly, try not to @ Jeremy and Sylvain directly :slight_smile: (and the handle of Sylvain is sgugger)


thank you for the hint!
I’ll take a look at it even if I don’t plan to develop without fastai :wink: but it could be useful.
Meanwhile I’ll try to go on and keep you updated!

You’re welcome !

I don’t know how advanced you are in Deep Learning but if you’re beginning keep in mind that Object Detection is a hard problem, so please don’t be discouraged if you find it difficult : it’s normal !


Absolutely right. I didn’t think that the concept of Image Segmentation is easier to understand than Object Detection.

1 Like

Ciao pietro, you’re simply asking about multi-class multi-label classification. It is a well studied problem and not a big deal nowadays. Moreover it is very easy to implement thanks to fastai. But of course it is up your current understanding of the topic. I suggest you to check this out


Maybe you have a typo, he’s asking about Object Detection which is different from multi-label classification. Object Detection is indeed studied in lessons 8 and 9 of part 2 v2, but the code that comes with those lectures is only working with fastai v0.7, not fastai v1.

I’m not aware of any “easy” way to implement Object Detection with fastai v1 right now. One way is to try to port the code from lesson 9 v2 to fastai v1, but I wouldn’t call that easy.

1 Like

I guess I am doing an Object Detection simplified since I actually want to identify bounding boxes, but just one per picture with only one class type. This can be seen as a regression problem since I only need to find the coords that define the two top left and bottom right points of the bounding box.

here you can see that my databunch works fine:

I tried to re-adapt radekosmulski’s notebook which identifies flukes (here) and this is the code:


class FaceDetector(nn.Module):
        def __init__(self, arch=models.resnet18):
            self.cnn = create_body(arch)
            self.head = create_head(num_features_model(self.cnn) * 2, 4)
        def forward(self, im):
            x = self.cnn(im)
            x = self.head(x)
            return 2 * (x.sigmoid_() - 0.5)

loss function:

def loss_fn(preds, targs, class_idxs):
    return nn.L1Loss()(preds, targs.squeeze())

building learner:

learn = Learner(data, FaceDetector(arch=models.resnet50), loss_func=loss_fn)
learn.metrics = [lambda preds, targs, _: dice(preds, targs.squeeze(), iou=True).mean()]

Here I modified the metrics because neither I found a definition for IoU (which is intersection over union) nor I understood how to write it. On the other hand, I found this “dice” function which has this bool parameter “iou” (may be a good replacement?).

Then I try to run learner:

learn.fit_one_cycle(4, 1e-2)

But I find this error:

RuntimeError: expected type torch.FloatTensor but got torch.LongTensor

any ideas? I think the main problem here is on metrics or on loss function because the learner starts running and after a while (before closing an epoch) it throws this exception…

I found how to fix my error.
After checking all other classes from the notebook that detects flukes I found this definition for IoU metric:

def intersection(preds, targs):
# preds and targs are of shape (bs, 4), pascal_voc format
max_xy = torch.min(preds[:, 2:], targs[:, 2:])
min_xy = torch.max(preds[:, :2], targs[:, :2])
inter = torch.clamp((max_xy - min_xy), min=0)
return inter[:, 0] * inter[:, 1]

def area(boxes):
return ((boxes[:, 2]-boxes[:, 0]) * (boxes[:, 3]-boxes[:, 1]))

def union(preds, targs):
return area(preds) + area(targs) - intersection(preds, targs)

def IoU(preds, targs):
return intersection(preds, targs) / union(preds, targs)

After executing this code I can correctly run the learner. Actually it gets a value of 0 for each epoch at the moment, I guess because the coords are in a different format compared to the ones used in that notebook.
But the good news is that now it runs!
I’ll post here news when I manage to get some results…


Do any of the notebooks work with PyTorch 1.0 and the latest fastai library (v 1.0.34)?

Not the RetinaNet ones, but this SSD implementation should work with PyTorch and fastai latest version. I’ve not tried it myself.

Also I don’t know if it’s on purpose but you’re a bit behind on your fastai version, it’s now at 1.42 if memory serves.


Thanks for the update, please let us know if you succeed any further.
My project have similar requirement, but it will also include images with no object at all (Actually, most of the images will be without an object), do you think this direction will still work?

I replicated fastai pascal notebook for UECfooddata256 everything for validation data work fine but unable to get result on the new data

        please help me out what am i missing or learn.predict() is does not work for object detection?


learn.predict does not work. instead do this :slight_smile:
img2 = open_image(base_dir+’/models/object_detection/offset.jpg’)

img = pil2tensor(img,np.float32)
img = img.div_(255)
with torch.no_grad():
show_preds(img2, output, 0, detect_thresh=0.4, classes=data.classes)


can you please tell me what is the best mAP value you managed to get with that notebook?

Hi @pietro.latorre. You mind this useful:

1 Like

Hi there! Did you solved it?

Hi Pietro
Thanks for sharing! I also would like to have a working notebook on object detection but using SVHN dataset, because it has annotated bounding boxes. Do you know if SSD would work well on this dataset? I have started experimenting SSD with this dataset but have yet to get satisfactory results. Online search results tend to show examples of SSD on either Pascal-VOC or COCO, but not for the SVHN. This makes me wonder if SVHN is suitable for SSD implementation. I would appreciated your comments if you know of this dataset.