Dynamic SSD implementation for fastai v1

(Rohit Singh) #1

Hi all,

I and @divyansh have implemented a dynamic Single Shot Detector for fastai v1, based on part 2, Lesson 9 (pascal-multi).

The dev notebook at https://github.com/rohitgeo/singleshotdetector/blob/master/SingleShotDetector%20on%20Pascal.ipynb shares this implementation.

A simple 4x4 grid with one anchor box per grid cell can be created using
simple_ssd = SingleShotDetector(data, grids=[4], zooms=[1.0], ratios=[[1.0, 1.0]])

A full SSD can be created using
ssd = SingleShotDetector(data, grids=[4, 2, 1], zooms=[0.7, 1., 1.3], ratios=[[1., 1.], [1., 0.5], [0.5, 1.]])

The constructor allows specifying any number of grid sizes, zoom levels and aspect ratios for the anchor boxes and creates the appropriate network architecture.

Let us know if you’d like to see this in a PR.


Working notebook for Object Detection
Object detection in fast.ai v1
Problem with Pascal, ImageDataBunch and object detection

Hello @rohitgeo, wonderful stuff you started here. I’ve been looking for a V1 implementation for object detection.

I am running through your notebook 1:1 but receive the error after ssd.lr_find()

  File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py", line 50, in bb_pad_collate
    bboxes[i,-len(lbls):] = bbs
RuntimeError: The expanded size of the tensor (13) must match the existing size (0) at non-singleton dimension 0.  Target sizes: [13, 4].  Tensor sizes: [0, 4]

Have you encountered this problem ?

(Rohit Singh) #3

Hi @hud, this could happen if there is an image that does not have any bboxes in it. A fix for this was recently added to fastai - see https://github.com/fastai/fastai/pull/1526/files

Otherwise, you could use our patched collate function instead:

def _bb_pad_collate(samples, pad_idx=0):
    "Function that collect `samples` of labelled bboxes and adds padding with `pad_idx`."
    arr = []
    for s in samples:
        except Exception as e:
            # set_trace()
            # print(s[1].data[1],s[1].data[1],e)
    max_len = max(arr)
#    max_len = max([len(s[1].data[1]) for s in samples])
    bboxes = torch.zeros(len(samples), max_len, 4)
    labels = torch.zeros(len(samples), max_len).long() + pad_idx
    imgs = []
    for i,s in enumerate(samples):
        bbs, lbls = s[1].data
        # print(bbs, lbls)
            bboxes[i,-len(lbls):] = bbs
            labels[i,-len(lbls):] = lbls
        except Exception as e:
    return torch.cat(imgs,0), (bboxes,labels)


Hello @rohitgeo sorry I just got back and saw your message.

With your code _bb_pad_collate the bounding boxes were not displayed.

That fastai fix should be included with the latest fastai library right ? I still get the same error with the update. But adding the fastai fix directly in the notebook works:

def bb_pad_collate(samples:BatchSamples, pad_idx:int=0) -> Tuple[FloatTensor, Tuple[LongTensor, LongTensor]]:
    "Function that collect `samples` of labelled bboxes and adds padding with `pad_idx`."
    if isinstance(samples[0][1], int): return data_collate(samples)
    max_len = max([len(s[1].data[1]) for s in samples])
    bboxes = torch.zeros(len(samples), max_len, 4)
    labels = torch.zeros(len(samples), max_len).long() + pad_idx
    imgs = []
    for i,s in enumerate(samples):
        bbs, lbls = s[1].data
        if not (bbs.nelement() == 0):
            bboxes[i,-len(lbls):] = bbs
            labels[i,-len(lbls):] = tensor(lbls)
    return torch.cat(imgs,0), (bboxes,labels)

(Rohit Singh) #5

@hud thanks for confirming that it works in the notebook. The fastai code is undergoing many changes - its a moving target… and some recent change is causing the bonding boxes to not show up. Our code was written with fastai 1.0.39 and should work with that.

(Vu Ha) #6

Nice implementation @rohitgeo! Are you planning to add common metrics such as mAP as well? It’d be super helpful to compute them and compare to results in the literature to gain confidence that the implementation is correct.

(Rohit Singh) #7

Yes, we are working on mAP

(Vu Ha) #8

Why do you normalize the bbox during loss computation: bbox = (bbox + 1.)/2.?

(Rohit Singh) #9

This is being done to convert bbox from [-1,1] range to [0,1] range that the code expects.

(Vu Ha) #10

Is it then also necessary to normalize the output from actn_to_bb (a_ic) in calculating the L1 loss?

I put in assertions for normalized bbox being non-negative, which passed, and non-normalized a_ic being non-negative, which failed.

def _ssd_1_loss(self, b_c, b_bb, bbox, clas, print_it=False):
        bbox,clas = self._get_y(bbox,clas)
        bbox = self._normalize_bbox(bbox)

        a_ic = self._actn_to_bb(b_bb, self._anchors, self._grid_sizes)
        overlaps = self._jaccard(bbox.data, self._anchor_cnr.data)
            gt_overlap,gt_idx = self._map_to_ground_truth(overlaps,print_it)
        except Exception as e:
            return 0.,0.
        gt_clas = clas[gt_idx]
        pos = gt_overlap > 0.4
        pos_idx = torch.nonzero(pos)[:,0]
        gt_clas[1-pos] = 0 #data.c - 1 # CHANGE
        gt_bbox = bbox[gt_idx]
        loc_loss = ((a_ic[pos_idx] - gt_bbox[pos_idx]).abs()).mean()

(xnet) #11

Has anyone tried other backbones? I’ve tried Resnet101 and got the following:

RuntimeError: Given groups=1, weight of size [256, 512, 3, 3], expected input[32, 2048, 7, 7] to have 512 channels, but got 2048 channels instead

Any advice on how to troubleshoot these error messages? Thanks!

(Vu Ha) #12

When you switched backbone, the shapes of the tensors across the forward propagation may change. I’d recommend looking closely at the interface between Resnet101’s last layers and the SSDHead first layers.

(Divyansh Jha) #13

No it is not necessary! Because a_ic are computed from model’s output. So as long as we are normalizing the model’s input and training it on those normalized bboxes, it is good.

a_ic can be negative initially when model is not trained. Even after when the model is trained some elements of the a_ic variable can be negative as well as greater than the size of the image. This just simply means the bbox’s can also be outside of the image, which doesn’t do any harm. If you plot you’ll see some bboxes get’s clipped on the edges.