Object detection using fast.ai+pytorch+Faster R-CNN

Hi everbody!

I have been working with the Tensorflow Object detection API + Faster R-CNN to detect dead trees from large aerial/satellite images. The images are huge, so they are split up in a 600X600 moving window. The training dataset is not very large (2000 images), so I use transfer learning as descirbed in the API docs to train the last layer of the model which works quite well.

Since I have started the #part1-v2 fast.ai course I was wondering if all I am doing with tensorflow can´t be faster and easier when using fast.ai+pytorch. Also the neat features of the fast.ai library like the learning rate finder, Stochastic Gradient Descent (SGD) with Restart and so on make it very appealing to try this approach! Unfortunately I haven´t found any helpful info on this subject or the pytorch forum about this…

So my question to the people of this lovely forum is, if anybody has tried already to do object detection with the fast.ai library using pretrained pytorch Faster R-CNN, R-FCN, SDD models or could point me in a good direction where to start? @jeremy or is this going to be coverd in #part2 by coincidence?

I have found an two interesting pytorch implementation of Faster R-CNN and for SDD that could be useful for this:

I am looking forward to your responses :grinning:


Hi, I saw you haven’t got any responses. Have you been able to implement this using fastai?

You can find Object Detection model with fast.ai v1 in this thread: Object detection in fast.ai v1

For details, we have SSD and Retina Net.

Hope that helps


Recently torchvision was updated with pretrained models for faster r-cnn (and more):

I’ve been trying to get it to play nice with fastai but it seems to be incompatible, or am I missing something?

I had it up to the point where I was able to create a Learner (with its constructor), but the main problem seems to be that the loss function is baked into the model. In training mode the model expects images and targets (model(images, targets)) so that the model can produce the loss(es).
The fastai Learner's fit() (or actually in loss_batch()) would only pass images.

I see how a some of the complexity was abstracted away like this, but still… thinking of compatibility, why did they decide to do this?
Is there some feature in fastai that I’ve missed that anticipates this sort of behavior?

I’ve been at it a few days now but so far haven’t been able to make it work. I guess to get it working one would need to:
a) clone and alter most of the torchvision detection code; or
b) create a custom learner?; or
c) hopefully something better that I haven’t discovered yet :wink:

Has anyone else had more luck with this?


Hi (Moin),

I´m trying to achieve the same, but facing the same challenges as you do. Any progress from your side?

With kind regards,

Hi Christian,

I started with altering the torchvision detection code.
I changed GeneralizedRCNN's forward() to not calculate the losses but only return a dict with the images and the features that came out of the backbone (while training). Then steal rpn and roi_heads and add them to a custom loss function. If I created a custom DataSet (or pipeline? I haven’t read up on that yet) that could supply the model with the images and targets simultaneously, I wouldn’t have had to move those into the loss function, but somehow I was determined to use an ObjectItemList to create a DataBunch ;D

fit() ran, but the losses went to infinity or NaN. More specifically, I noticed loss_rpn_box_reg losses started to contain infs or NaNs.

At first I figured it was because I wasn’t supplying the target box coords in the right order, but now I guess it’s because I thought fastai could take care of transforms so I disabled the transforms in GeneralizedRCNN. I haven’t looked closely at the transforms but there might be some necessary normalization or something that I overlooked :sweat_smile:

I’ll give it another go soon. Have you made more progress?


@r3v @Bronzi88 I ran into the same question, did any of you came up with a solution?

Hi Denis,

For my use case I found out that I’d be better off with YOLO (for quicker inference speed), which I also tried to implement using fastai. The issues and solution are very similar. The model takes both the images and targets to calculate the loss.

What I did was:

  • create a Dataset that returns both training data and targets in __getitem__
  • create the architecture and load the pre-trained weights
  • create a fake loss function (since the model calculates the loss) which is literally def fakeLoss(out, yb): return out (this is a total hack, and maybe I overlooked a more elegant way to do this :smiley:)
  • create the learner with Learner(data, model, loss_func=fakeLoss)

I got it to train and losses were going down, but interpreting the data and results was difficult.

Better not use a Dataset but instead override the ObjectItemList and ImageBBox classes (or write a custom ItemList), since using a Dataset has its limitations (see https://docs.fast.ai/basic_data.html#Functions-that-really-won’t-work). The ‘custom’ bit here would be mostly overriding how the data is prepped for the model. This way it should be easier to leverage more of fastai’s sweet data augmentation stuff, too.

Object Detection metrics are tricky and I was looking into mean average precision but hadn’t really continued working on all this. Maybe see https://github.com/rafaelpadilla/Object-Detection-Metrics for more details.

Fastai v2 is developing rapidly and sadly I haven’t got enough time to keep up :frowning: but I hope in v2 Object Detection will become even easier.

Hope this was useful

1 Like

Thanks for your detailed answer @r3v.
Just to confirm: did you reimplement YOLO yourself or used an existing implementation? Where did you get the pre-trained weights from?

Hi Denis,

I used the PyTorch YOLOv3 implementation from https://github.com/DeNA/PyTorch_YOLOv3 . The pre-trained weights are the darknet weights available on the official site. This repo also shows how to load them.

To get back on topic: for pytorch-vision’s Faster R-CNN the steps are fairly similar, I think.

1 Like