Object detection in fast.ai v1

I’m trying to adapt the RetinaNet notebook to a dataset that contains empty images with no objects. Does anyone know how I might go about adapting the current notebook to accommodate this?

Right now the notebook as is works fine when I use the subset of the data that contains images, so everything is good there. When I try to add in images without labels, I get errors. Specifically the create method of the ImageBBox class tries to index into an empty list.

Does anyone know a good format for representing empty images in bounding box format? I’ve considered adding some dummy coordinates like [0, 0, 0, 0] but I’m concerned that will have weird effects on training.

Hi (Moin),

if you use [0, 0, 0, 0] with the background class as the label that should be work just fine.

With kind regards,

Gave this a try. It causes some sort of numerical issue. When I create a dataloader, I get a ton of UserWarning: Tensor is int32: upgrading to int64; for better performance use int64 input

When I try to call data.show_batch(rows=3), the kernel dies. The issue repeats.

Maybe extending the bounding box for the background to the entire picture area helps to avoid this problem?


in case you don’t have any labels

def get_y_func(o):
    if str(o.id) in img2bbox: #Labels?
        return img2bbox[str(o.id)]
    else: #No labels?
        return [[[0,0,0,0]], ['background']]

works just fine for me.

This does work for dealing with empty images. You do end up with an extra class (‘background’ is listed twice in data.classes and data.c is 3, but monkey patching these seems to work.

The dead kernel thing is still an issue, which is weird. It seems like if the dataloader gets a batch of empty images, something goes wrong. I’m doing this on Windows 10 so this could be an OS specific issue. My first thought was maybe it’s a multiprocessing thing, so I set num_workers=1 but that didn’t solve it. I’m not really sure how to troubleshoot because things go from fine to dead kernel without any kind of error.

Might spin up an Ubuntu server and see if the issue persists there.


to deactivate multiprocessing, you have to set num_workers to zero. At least if I remember correctly.

'background’ is listed twice. Okay, that is not what I intended. But the fastai method generate_classes add [‘background’] anyway and do not check if its already in the list of classes. Sorry about that.

But you can easily fix that by overwriting the method generate_classes from the class ObjectCategoryProcessor.

With kind regards,

1 Like

yes thanks . It worked. problem ws i import PIL image also so it was not working with fai Image object.
some more asks

  1. My loss dsnt improves after certain value like 1.70 .
    I have doubt on the matches function if it is assigning the ground truth box correctly.
    In my case there is only a single object to be detected and single class.
    . Finall output of matches function ( that tends to assign gt bb to anchor) yields me anchor with dim more than 1 so n * 4 . and dim of target is 1 * 4 ( it expands as n*4 ) before going to loss function for regression.

I thought we should get only one anchor .that has gt box. by furhter using max
Here is my understanding on matches if i understand correctly

bbox_tgt, clas_tgt = self._unpad(bbox_tgt, clas_tgt)
            matches = match_anchors(self.anchors, bbox_tgt)
            bbox_mask = matches>=0
            if bbox_mask.sum() != 0:
                bbox_pred = bbox_pred[bbox_mask]
                bbox_tgt = bbox_tgt[matches[bbox_mask]]
                bb_loss = self.reg_loss(bbox_pred, bbox_to_activ(bbox_tgt, self.anchors[bbox_mask]))

a) calculate Iou between class object bbox and all possible anchor bbox
b) get the anchor which can identify a particular class out of multiple classes if any using iou. max
c) Apply threshold to separate the bg and foreground. and assign the id of class to all the anchor who are able to find this particular class id.
Now shldnt there be one more filter to get that anchor which further has got best iou match with Ground truth of class it is confident about ?

  1. what is the purpose of activ to boox and vice versa . box to activation .
    why is that needed.

Thanks again @Bronzi88 for the notebook! I am currently working with a meter reading dataset, trying to extract the digits of the reading.
I was able to pretrain the RetinaNet by resizing all images to 256, as you said it currently only works with sizes 256x256.

I tried to use discriminative learning rates when training with learn.fit_one_cycle but it throws a “Divsion by zero Error”.
Am i doing something wrong or is this not implemented yet, as you don’t use it in your notebook either.


check out on your layer groups probably…

This works very well, thanks for sharing!

Do you have any tips on inference? When we train we feed fixed size images (eg 256x256) to the network, but at test time our images aren’t always of this shape.

How did you handle working with other sizes of images? Predict for each possible slice of the image and combine bounding boxes? Also, did you use fastai’s ‘get_preds()’?

Dear Christian
This is regarding the PascalVoc metric that you have incorporated into the callbacks. py at your github repository for object detection

In the below code i see that to overcome the performance issue for NMS we pass only subset of predictions made for an image ,but wont we miss out on the better box pred in this process. We perform the sorting based on the scores inside NMS… shouldnt we first sort based on score and then do subset of that and pass to NMS ?

if the number is to hight evaluation is very slow

        total_nms_examples = len(class_gt) * 3
        bbox_pred = bbox_pred[:total_nms_examples]
        scores = scores[:total_nms_examples]
        preds = preds[:total_nms_examples]

Since part 2 took a deep dive behind the curtains and didn’t include practical examples, will there be a part 3 where object detection and bounding boxes are covered for fast.ai v1 on practical level?

It will be covered in the extra lessons.

Fantastic. Is there any ETA for when the extra lessons will be published? Didn’t notice any mention of it in the lectures or other materials. Just thinking if I should invest the time to learn it with other libraries or wait for the Fast.ai lessons to arrive.

No, we’re fully focused on the development and release of v2 first. Extra lessons will be after.


Got it, thanks. Looking forward to v2! Really appreciate what you guys do.

Same question, did you solve it ?

Hi @KarlH,

Did you solve the issue with the ‘background’ data? I could create a databunch with both labelled and unlabelled data. Everything seems ok but when I tried to fit the data I got the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-64-f250d77c386e> in <module>
----> 1 learn.fit_one_cycle(4, 1e-3, wd=1e-3)

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
     21     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
     22                                        final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 23     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     25 def fit_fc(learn:Learner, tot_epochs:int=1, lr:float=defaults.lr,  moms:Tuple[float,float]=(0.95,0.85), start_pct:float=0.72,

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    198         else: self.opt.lr,self.opt.wd = lr,wd
    199         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
--> 200         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    202     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
     99             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
    100                 xb, yb = cb_handler.on_batch_begin(xb, yb)
--> 101                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
    102                 if cb_handler.on_batch_end(loss): break

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     29     if not loss_func: return to_detach(out), to_detach(yb[0])
---> 30     loss = loss_func(out, *yb)
     32     if opt is not None:

~/anaconda3/envs/Fastai/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/DeepLearning/ObjectDetection/loss/RetinaNetFocalLoss.py in forward(self, output, bbox_tgts, clas_tgts)
     53         focal_loss = torch.tensor(0, dtype=torch.float32).to(clas_preds.device)
     54         for cp, bp, ct, bt in zip(clas_preds, bbox_preds, clas_tgts, bbox_tgts):
---> 55             bb, focal = self._one_loss(cp, bp, ct, bt)
     57             bb_loss += bb

~/DeepLearning/ObjectDetection/loss/RetinaNetFocalLoss.py in _one_loss(self, clas_pred, bbox_pred, clas_tgt, bbox_tgt)
     29     def _one_loss(self, clas_pred, bbox_pred, clas_tgt, bbox_tgt):
---> 30         bbox_tgt, clas_tgt = self._unpad(bbox_tgt, clas_tgt)
     31         matches = match_anchors(self.anchors, bbox_tgt)
     32         bbox_mask = matches >= 0

~/DeepLearning/ObjectDetection/loss/RetinaNetFocalLoss.py in _unpad(self, bbox_tgt, clas_tgt)
     16     def _unpad(self, bbox_tgt, clas_tgt):
---> 17         i = torch.min(torch.nonzero(clas_tgt - self.pad_idx))
     18         return tlbr2cthw(bbox_tgt[i:]), clas_tgt[i:] - 1 + self.pad_idx

RuntimeError: invalid argument 1: cannot perform reduction function min on tensor with no elements because the operation does not have an identity at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathReduce.cu:64

Checking the forums seems to be somethig related with the images/labels, so I guess is due to the [0,0,0,0] ['background'] trick. Any idea how to solve it?

Very nice work by @Bronzi88. Running fastai version 1.0.57 I got the same error mesage as @Joan when trying to run the examples/CocoTiny_Retina_Net.ipynb notebook. I was able to fix it by using squishing for resizing and removing the default tranforms. This seems to avoid bboxes ending up outside of the transformed image.

data = (ObjectItemList.from_folder(
        .transform(tfm_y=True, size=size, resize_method=ResizeMethod.SQUISH)  
        .databunch(bs=64, collate_fn=bb_pad_collate))