I have been working with the Tensorflow Object detection API + Faster R-CNN to detect dead trees from large aerial/satellite images. The images are huge, so they are split up in a 600X600 moving window. The training dataset is not very large (2000 images), so I use transfer learning as descirbed in the API docs to train the last layer of the model which works quite well.
Since I have started the #part1-v2 fast.ai course I was wondering if all I am doing with tensorflow can´t be faster and easier when using fast.ai+pytorch. Also the neat features of the fast.ai library like the learning rate finder, Stochastic Gradient Descent (SGD) with Restart and so on make it very appealing to try this approach! Unfortunately I haven´t found any helpful info on this subject or the pytorch forum about this…
So my question to the people of this lovely forum is, if anybody has tried already to do object detection with the fast.ai library using pretrained pytorch Faster R-CNN, R-FCN, SDD models or could point me in a good direction where to start? @jeremy or is this going to be coverd in #part2 by coincidence?
I have found an two interesting pytorch implementation of Faster R-CNN and for SDD that could be useful for this:
Recently torchvision was updated with pretrained models for faster r-cnn (and more):
I’ve been trying to get it to play nice with fastai but it seems to be incompatible, or am I missing something?
I had it up to the point where I was able to create a Learner (with its constructor), but the main problem seems to be that the loss function is baked into the model. In training mode the model expects images and targets (model(images, targets)) so that the model can produce the loss(es).
The fastai Learner's fit() (or actually in loss_batch()) would only pass images.
I see how a some of the complexity was abstracted away like this, but still… thinking of compatibility, why did they decide to do this?
Is there some feature in fastai that I’ve missed that anticipates this sort of behavior?
I’ve been at it a few days now but so far haven’t been able to make it work. I guess to get it working one would need to:
a) clone and alter most of the torchvision detection code; or
b) create a custom learner?; or
c) hopefully something better that I haven’t discovered yet
I started with altering the torchvision detection code.
I changed GeneralizedRCNN's forward() to not calculate the losses but only return a dict with the images and the features that came out of the backbone (while training). Then steal rpn and roi_heads and add them to a custom loss function. If I created a custom DataSet (or pipeline? I haven’t read up on that yet) that could supply the model with the images and targets simultaneously, I wouldn’t have had to move those into the loss function, but somehow I was determined to use an ObjectItemList to create a DataBunch ;D
fit() ran, but the losses went to infinity or NaN. More specifically, I noticed loss_rpn_box_reg losses started to contain infs or NaNs.
At first I figured it was because I wasn’t supplying the target box coords in the right order, but now I guess it’s because I thought fastai could take care of transforms so I disabled the transforms in GeneralizedRCNN. I haven’t looked closely at the transforms but there might be some necessary normalization or something that I overlooked
I’ll give it another go soon. Have you made more progress?
For my use case I found out that I’d be better off with YOLO (for quicker inference speed), which I also tried to implement using fastai. The issues and solution are very similar. The model takes both the images and targets to calculate the loss.
What I did was:
create a Dataset that returns both training data and targets in __getitem__
create the architecture and load the pre-trained weights
create a fake loss function (since the model calculates the loss) which is literally def fakeLoss(out, yb): return out (this is a total hack, and maybe I overlooked a more elegant way to do this )
create the learner with Learner(data, model, loss_func=fakeLoss)
I got it to train and losses were going down, but interpreting the data and results was difficult.
Better not use a Dataset but instead override the ObjectItemList and ImageBBox classes (or write a custom ItemList), since using a Dataset has its limitations (see https://docs.fast.ai/basic_data.html#Functions-that-really-won’t-work). The ‘custom’ bit here would be mostly overriding how the data is prepped for the model. This way it should be easier to leverage more of fastai’s sweet data augmentation stuff, too.
Thanks for your detailed answer @r3v.
Just to confirm: did you reimplement YOLO yourself or used an existing implementation? Where did you get the pre-trained weights from?
I used the PyTorch YOLOv3 implementation from https://github.com/DeNA/PyTorch_YOLOv3 . The pre-trained weights are the darknet weights available on the official site. This repo also shows how to load them.
To get back on topic: for pytorch-vision’s Faster R-CNN the steps are fairly similar, I think.