Object detection in fast ai

I trained retinaNet with fast ai ( by following jeremy’s notebook in the github repo), and got acceptable mAP(around 0.3). but upon using it for inference on some images and my webcam video, its performance seemed, very very bad. here’s an example of that:

it seems that its bad at detecting small objects in particular.
can anyone explain to me the reason for this? are there maybe some transforms that i’m not applying to the input image?or some other reason perhaps …

here is the inference code (a bit messy, sorry):

1 Like

In your pascal_detection.py you did not normalize the image, except changing the scale. Have you used any normalization of the images in training?

Edit: I looked at the fastai_docs/pascal.ipynb and there also no normalization was used. I didn’t enrolle in the course so i havent seen the video. Do you no the reason for not using normization there? I think the normalization would be better there because we use the pretrained feature extractor for which normalization was used.

1 Like

all i did was .div_255 ,i dont know if that counts as normalization. i also dont know what are the exact transforms applied to the images in fastai (in preprocessing).
i didnt take the course either but i think since jeremy used images from his validation dataset , he doesnt need to apply transforms again since they already are transformed.

should i use the .Normalize() in torchvision with a mean of 1 and std of 0?

You should use the exact same normalization method as you used in your training. In current pascal.ipynb I do not find the point were the dataset is normalized (should be at the databunch creation). If it is not normalized you also do not do it for your image.

1 Like

oh you’re right, didnt notice that. jeremy must have some reason behind that tho. and i also figured why i get no bounding boxes: i put the wrong coordinated of the rectangle.i m using cv2.rectangle and the notebook is using patch.Rectangle. i tried to adapt the coordinates to cv2.rectangle and messed up.

when i tested this image with my code i got one big bounding box :rofl:

Hi there, I wonder if anyone can help me please.

I have adapted the Retinanet model (using focal loss) for multi-object detection and my results are quite poor. The notebook I have used can be found here: https://github.com/fastai/course-v3/blob/master/nbs/dl2/pascal.ipynb I have obviously changed parts to use my data etc and the original image sizes are 640 x 640.

The aim of the model is to detect the wooden poles used in street lighting. The images I am training on contain labels for trees and wooden light poles, the trees are labelled to help the model distinguish between wooden poles and actual trees since poles could look like tree branches.

I have tried running the model on two different types of dataset:

The original has 839 Training images and 160 Validation Images.

The second used imgaug to augment each of the images in the original dataset doubling it to 1678 images in the training set and 320 images in the validation set.

I ran three version of the model.

  1. Without any transformations and using the original dataset.
  2. Using get_transforms
  3. Using the augmented dataset and using the get_transforms method.

Anchor Boxes:

I believe my problem may come from the anchors I am using in the model. There is a function called create_anchors which will create a set of 9 anchors using different scales and ratios.

However, I find this function rather frustrating for the following reason:

In the images I have labelled, poles are always long, thin and rectangular whereas the trees I have labelled can span across the width of the entire image, tree bounding boxes can be massive, or small, and span larger regions than the poles do. The problem is, getting anchors to fit the bounding boxes of poles and trees which seems impossible to do with this function because it seems there is very little control in the function. I can manage to get good shaped anchors for poles say 5 good pole anchors, by entering some sizes and aspect ratios etc but then remaining 4 of the generated anchors are not a good enough size to fit my trees bounding boxes and therefore they don’t get detected in the model.

I decided to choose anchors to purely fit the shape of poles as shown below and I got the following Average Precisions:

Pole = 4.7 and Trees = 0, using No Augmentations on the Original Dataset
Pole = 11.16 and Trees = 0, using the Get_Aug transformations on the Original Dataset
Pole = 20.32 and Trees = 0, using the Augmented Dataset

As expected, the trees weren’t detected due to the shapes of the anchor boxes.

I changed the ratios and scales again for the anchors to try and get some shapes that would fit both the tree and pole objects as shown below and I got the following Average Precisions:

Pole = 9.14 and Trees = 1.9, using No Augmentations on the Original Dataset
Pole = 7.59 and Trees = 3.2, using the Get_Aug transformations on the Original Dataset
Pole = 13.76 and Trees = 3, using the Augmented Dataset

As can be seen, the AP for the poles got worse and the trees were beginning to be detected. I can’t seem to get a good balance between anchors for poles and trees with this create_anchors function.

I have a few questions as a result:

  1. Does the Retinanet model have to have 9 anchors per grid cell or can this be changed to have as many anchors as I want?
  2. Has anybody come across or developed a function that will allow me to create custom anchors, in my case for trees and poles separately and would they mind me using / adapting it ? I have dabble with creating a new function to achieve such a task but with no success.
  3. If changing the anchor sizes like this is going to be a problem, would it be a good idea to train a model to detect trees only, then use this as a pretrained model that I could then use to train to detect poles only.
  4. Does anybody know how I could accomplish question 3 ? Using my own pretrained model for object detection.

I think I may have just solved the problem, by modifying the create_anchors function

could you please share your loaded stage model please I dont have the data you trained the module with

Sorry - I am not able to share data for confidentiality reasons but thank you.

I am hoping somebody can help me because I am banging by head against a brick wall :stuck_out_tongue:
I am using the RetinaNet notebook (link posted in my post above), after running different experiments my results were getting worse and no where near what I got the first time round so I decided to do a check.
I looked at the model where I got good reults, an mAP of 0.36
So I took tha same model, same notebook, same data, changed NOTHING and my results are terrible mAP 0.15. Has anyone come across this before ? I haven’t changed a thing but I know that something must be different and I can’t seem to find out what it is!.