Object detection in fast.ai v1

Thanks again @Bronzi88 for the notebook! I am currently working with a meter reading dataset, trying to extract the digits of the reading.
I was able to pretrain the RetinaNet by resizing all images to 256, as you said it currently only works with sizes 256x256.

I tried to use discriminative learning rates when training with learn.fit_one_cycle but it throws a “Divsion by zero Error”.
Am i doing something wrong or is this not implemented yet, as you don’t use it in your notebook either.


check out on your layer groups probably…

This works very well, thanks for sharing!

Do you have any tips on inference? When we train we feed fixed size images (eg 256x256) to the network, but at test time our images aren’t always of this shape.

How did you handle working with other sizes of images? Predict for each possible slice of the image and combine bounding boxes? Also, did you use fastai’s ‘get_preds()’?

Dear Christian
This is regarding the PascalVoc metric that you have incorporated into the callbacks. py at your github repository for object detection

In the below code i see that to overcome the performance issue for NMS we pass only subset of predictions made for an image ,but wont we miss out on the better box pred in this process. We perform the sorting based on the scores inside NMS… shouldnt we first sort based on score and then do subset of that and pass to NMS ?

if the number is to hight evaluation is very slow

        total_nms_examples = len(class_gt) * 3
        bbox_pred = bbox_pred[:total_nms_examples]
        scores = scores[:total_nms_examples]
        preds = preds[:total_nms_examples]

Since part 2 took a deep dive behind the curtains and didn’t include practical examples, will there be a part 3 where object detection and bounding boxes are covered for fast.ai v1 on practical level?

It will be covered in the extra lessons.

Fantastic. Is there any ETA for when the extra lessons will be published? Didn’t notice any mention of it in the lectures or other materials. Just thinking if I should invest the time to learn it with other libraries or wait for the Fast.ai lessons to arrive.

No, we’re fully focused on the development and release of v2 first. Extra lessons will be after.


Got it, thanks. Looking forward to v2! Really appreciate what you guys do.

Same question, did you solve it ?

Hi @KarlH,

Did you solve the issue with the ‘background’ data? I could create a databunch with both labelled and unlabelled data. Everything seems ok but when I tried to fit the data I got the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-64-f250d77c386e> in <module>
----> 1 learn.fit_one_cycle(4, 1e-3, wd=1e-3)

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
     21     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
     22                                        final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 23     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     25 def fit_fc(learn:Learner, tot_epochs:int=1, lr:float=defaults.lr,  moms:Tuple[float,float]=(0.95,0.85), start_pct:float=0.72,

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    198         else: self.opt.lr,self.opt.wd = lr,wd
    199         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
--> 200         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    202     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
     99             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
    100                 xb, yb = cb_handler.on_batch_begin(xb, yb)
--> 101                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
    102                 if cb_handler.on_batch_end(loss): break

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     29     if not loss_func: return to_detach(out), to_detach(yb[0])
---> 30     loss = loss_func(out, *yb)
     32     if opt is not None:

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/DeepLearning/ObjectDetection/loss/RetinaNetFocalLoss.py in forward(self, output, bbox_tgts, clas_tgts)
     53         focal_loss = torch.tensor(0, dtype=torch.float32).to(clas_preds.device)
     54         for cp, bp, ct, bt in zip(clas_preds, bbox_preds, clas_tgts, bbox_tgts):
---> 55             bb, focal = self._one_loss(cp, bp, ct, bt)
     57             bb_loss += bb

~/DeepLearning/ObjectDetection/loss/RetinaNetFocalLoss.py in _one_loss(self, clas_pred, bbox_pred, clas_tgt, bbox_tgt)
     29     def _one_loss(self, clas_pred, bbox_pred, clas_tgt, bbox_tgt):
---> 30         bbox_tgt, clas_tgt = self._unpad(bbox_tgt, clas_tgt)
     31         matches = match_anchors(self.anchors, bbox_tgt)
     32         bbox_mask = matches >= 0

~/DeepLearning/ObjectDetection/loss/RetinaNetFocalLoss.py in _unpad(self, bbox_tgt, clas_tgt)
     16     def _unpad(self, bbox_tgt, clas_tgt):
---> 17         i = torch.min(torch.nonzero(clas_tgt - self.pad_idx))
     18         return tlbr2cthw(bbox_tgt[i:]), clas_tgt[i:] - 1 + self.pad_idx

RuntimeError: invalid argument 1: cannot perform reduction function min on tensor with no elements because the operation does not have an identity at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathReduce.cu:64

Checking the forums seems to be somethig related with the images/labels, so I guess is due to the [0,0,0,0] ['background'] trick. Any idea how to solve it?

Very nice work by @Bronzi88. Running fastai version 1.0.57 I got the same error mesage as @Joan when trying to run the examples/CocoTiny_Retina_Net.ipynb notebook. I was able to fix it by using squishing for resizing and removing the default tranforms. This seems to avoid bboxes ending up outside of the transformed image.

data = (ObjectItemList.from_folder(
        .transform(tfm_y=True, size=size, resize_method=ResizeMethod.SQUISH)  
        .databunch(bs=64, collate_fn=bb_pad_collate))

Hi @hallvagi, in my case this is not solving the problem. Anything else that you tried?
How is your unlabelled data? [‘background’] and [0,0,0,0]?

Ok, my bad. I was using the coco_sample dataset, and it only has labelled data.

When removing the labels from one of the images with img2bbox['000000318219.jpg'] = [[[0., 0., 0., 0.]],['background']] the notebook still runs fine, but the databunch now has 2 ‘background’ classes.

Using img2bbox['000000318219.jpg'] = [[[0., 0., 0., 0.]],[]] leaves the number of classes correct, and also works in my case.

But when using img2bbox['000000318219.jpg'] = [[[]],[]] it crashes with an index out of bounds error.

Hi, I am stuck with the same error, unable to understand the data causing this error. i am tryin to use run notebook given in course-v3 dl2

Great work! @Bronzi88 and thank you for sharing.

I am attempting to apply your methods to another dataset from LLNL.

I am able to get the data bunch working and have the bounding boxes loaded correctly.

I am also able to get the anchors working…

However, after loading the model and when I start to find the learning rate, it crashes:

At first, I thought it was because I had an image without a bounding box but I went through the images and they all have a bb. Would you have any guidance on how I can fix this issue?



yes, your number of boxes doesn’t fit the number of expected boxes for the model.

So this line:
anchors = create_anchors(sizes=[(32,32)], ratios=[1], scales=[0.35, 0.5, 0.6])

has to be consistent with this line:
model = RetinaNet(encoder, n_classes=data.train_ds.c, n_anchors=3, sizes=[32], chs=32, final_bias=-4., n_conv=2)

With kind regards,

1 Like

Thanks @Bronzi88 - I was able to build a model with your help. I’m wasn’t able to add different transform/augmentation techniques as I believe the bounding boxes were getting transformed outside the image boundary but Ill keep working on that later. I am currently stuck on getting a small inference pipeline working. I have a folder of test images, they are all pngs and have been converted to grey scale which are the same format that my model was trained on. I’m trying to follow the two examples that you mentioned earlier in this thread but am having a difficult time getting it to work.
It looks like the image needs to be converted from an array to an image tensor which is then fed through a small process of inference. Below is where I am currently.

I’m currently dealing with data structure errors where I think the pil2tensor process is not getting the data in the correct format required for inference. For example, this type of error

 RuntimeError: Given groups=1, weight of size 64 3 7 7, expected input[1, 2, 256, 256] to have 3 channels, but got 2 channels instead

Could you offer some guidance or code suggestions on how to pass an image from a test_directory to get the model output returned for this use case?

Thank you very much!

I have a question/problem regarding the Pascal Notebook from the fastai-course-v3 and I hope that maybe someone can point me in the right direction :slight_smile:

The Pascal Notebook itself works fine for my own object detection dataset. After playing aroung with it for some time, I wanted to try my own trained ResNet as a backbone.

However, I am stuck at using my saved ResNet model for the RetinaNet as a backbone…

Some thoughts I had so far:

  • When loading my own model as a Learner (e.g. with own_resnet18.load('res18-stage5'), it is associated with the dataset I used for training.
  • The RetinaNet is built with a pretrained RetinaNet from the model zoo and the last layers need to be cut off.
  • If I access the model of the Learner object via own_resnet18.model I can manually cut off the last layers, there is an additional Sequential wrapped around the remaining architecture (this looks different than the encoder in the Pascal notebook)

So I think I can manage to get rid off that extra Sequential around my own model, but is that even the right way?
Or, to be more precise: are the weights still stored inside the Sequential container? (I bet not ? :sweat_smile: and also I am sorry if this is a stupid question, I am still new to all this stuff…)

I would really appreciate any help or hints to how I can reuse an existing trained model for another task :slight_smile:

Nevermind, I guess I was too tired to make it work yesterday, I found the solution today: I used the state-dict to transfer the weights :slight_smile:

1 Like