Object detection of single class with RetinaNet

I want to detect signatures on photos and I cannot get good results. I have 60 labels in the train dataset and cannot anything out of it while Google AutoML does a decent job.

I tried changing ratios and scales for boxes for no avail. Any thoughts?

Also when I plot bboxes, images from dls.valid_ds are not resized - how do I get resized ones?
Answer - with Resize(size, method=ResizeMethod.Squish)(img)

My notebook

I am looking at the found boxes and they seem too big, even though I used very low scales:

scales = [0.25, 0.5, 0.75]
ratios = [0.25, 0.5, 1]

What’s happening?

I think something similar happened with me last year. You can use kmeans clustering with IoU as distance metric to automate finding anchor boxes for you. This is usually done in YOLO models. Here’s one link you can refer - https://lars76.github.io/object-detection/k-means-anchor-boxes/

If I remember correctly, I used kmeans to find both anchor ratios and sizes. It worked quite well!

1 Like

Ok, so the deal is that V2 uses x1,y1,x2,y2 bbox notation while V1 y1,x1,y2,x2.
And I used Retinanet code from V1.

1 Like

You should be able to pass in y_first = True to fix that

1 Like

But that doesn’t help since retinanet code has bunch of functions using V1 notations, and y_false doesn’t change the order of points in a databunch.

I don’t see any way except rewriting retinanet code. Maybe rewriting TensorBBox could help.

You may be better off here, but I still am not quite understanding asto why this would be an issue. Both v1 and v2 should be outputting the same thing. Can you make a databunch using the same base file and show what it’s outputted bounding box tensors look like? I’ve looked at this extensively and their endpoints should be the same.

I’ll double check but from what I see V2 databunch order is not the same as V1.

Yep, I confirm that the order in databunches is different for object detection.

Also, these functions return sizes like:
Transform(PILImage.create)() - w,h
open_image() - h,w

I hate this issue of different styles of bounding box representation. Someone should standardise it! It gets very confusing sometimes, especially when writing code to generate anchor boxes and loss functions.

I would also recommend you to reimplement it yourself. That’s what I did as well.
Btw anyone knows why fastai uses a range of -1 to 1 for anchor boxes instead of standard 0 to 1 range.
I used the 0 to 1 range for anchor boxes in my implementation and it worked fine.

When we transform points, they’re normalized -1 to 1

That’s because we use tanh, isn’t it? I have used sigmoid and it works too. Is there any specific reason for using tanh?

No that’s because it’s what PyTorch uses for functions like grid_sample and affine_grid, so we follow that convention.

1 Like