SSD Object Detection in V1

(J. Adolfo Villalobos) #1

Professor Howard, in the 2018 Advanced Course (Part 2) presented an example of SSD Single Shot Object Detection. It was a very interesting albeit complex notebook; unfortunately, it was written in Fastai V0.7.

I have ported the SSD notebook into Fastai V1 and it works very well. You can find it in my GitHub at

This was the most challenging model in the course and also a wonderful learning tool. I hope it will be of value to the community. In developing the model, I was inspired by previous work by Henin.


(Phuc Ng. Su) #2

page note found! please check the link

1 Like

(J. Adolfo Villalobos) #3

My apologies. Try again and let me know.

1 Like

(Allan Jackson) #4

Jose: It looks like you have put a tremendous amount of work into this notebook. It runs smoothly
There is, however, a typographic error on code line number 39. It is:
learn.load(F’/content/gdrive/My Drive/model-SSD/ssd_unfreeze_2_2’, strict=True)
Line number 39 Should Be:
learn.load(F’/content/gdrive/My Drive/ssd_unfreeze_2_2’, strict=True)
IN addition line 66 is:’/content/gdrive/My Drive/ssd_unfreeze_7)
Line number 66 should be:’/content/gdrive/My Drive/ssd_unfreeze_7’)
Please keep up the great work!

1 Like

(J. Adolfo Villalobos) #5

Thank you for pointing out the errors, Allan. I believe they are leftovers from my debugging frenzy.

In the mean time, I discovered another bug of my own:
Line 105: if is_unfreeze(learn):. The correct statement should be: if is_unfreeze(learner):
As it is, the method works because learn is a valid variable used elsewhere in the notebook, but if you lift the find_optimal_lr method for use in another notebook, it will fail.

If you are interested in object detection, you’d know that SSD is fast and relatively accurate. However, it is not very good detecting small objects in images. I have been playing with the model and found that by adding a couple of extra conv2 layers to the layers that detect small objects, I can improve the model’s small object detection power. I call them “booster layers” .Try it. Replace the ssd_model method with this one:

class ssd_model(nn.Module):
def __init__(self, arch=models.resnet34, k=9 , drop=0.4, no_cls=21):
    self.k = k
    self.body = create_body(arch)
    self.drop = nn.Dropout(0.4)   # it was 0.25

    self.std_conv_0 = conv2_std_layer(num_features_model(self.body), 256, drop=drop,stride=1)
    # Dimension-reducing  layers
    self.std_conv_1 = conv2_std_layer(256, 256, drop=drop, stride=2) # 4 by 4 layer
    self.std_conv_1_1 = conv_layer(256, 256, stride=1)  # Booster layer
    self.std_conv_2 = conv2_std_layer(256, 256, drop=drop, stride=2) # 2 by 2 layer
    self.std_conv_1_2 = conv_layer(256, 256, stride=1)  # Booster layer
    self.std_conv_3 = conv2_std_layer(256, 256, drop=drop, stride=2) # 1 by 1 layer
    # SSD layers
    self.ssd_conv_1 = conv2_ssd_layer(256, k=self.k, no_cls=no_cls)
    self.ssd_conv_2 = conv2_ssd_layer(256, k=self.k, no_cls=no_cls)
    self.ssd_conv_3 = conv2_ssd_layer(256, k=self.k, no_cls=no_cls)
def forward(self, xb):
    xb = self.drop(F.relu(self.body(xb)))
    xb = self.std_conv_0(xb)
    xb = self.std_conv_1(xb)
    xb = self.std_conv_1_1(xb)
    bb1, cls1 = self.ssd_conv_1(xb) # 4 x 4
    xb = self.std_conv_2(xb)
    xb = self.std_conv_1_2(xb)
    bb2, cls2 = self.ssd_conv_2(xb) # 2 x 2
    xb = self.std_conv_3(xb)     
    bb3, cls3  = self.ssd_conv_3(xb) # 1 x 1
    return [[bb1, bb2, bb3], dim=1), 
  [cls1, cls2, cls3], dim=1)]


Thank you very much for this, Jose. This is great. Please, could you explain us how to use the learning rate finder?


(J. Adolfo Villalobos) #7

Hi, Alonso,

As you know, a learning rate value is most effective when it is selected at a point in the lr graph where the negative slope is largest. What the find_optimal_lr function does is calculate all these inflection points along the lr graph and display them for you to choose.

To use the function, first run lr_find, then run find_optimal_lr.

When you start training a model, you should choose the largest lr value possible. So, looking at the graph, you should choose one of the rightmost inflection points, with the caveat. Do not pick up a point that is too close to where the lr or gradient graph shoots up (an order of 10 or closer). These points train the model poorly.

You can pick up points from either graph: the loss (blue) or the gradient (yellow). You will notice that there are points on the blue and yellow graphs that have almost the same lr value. These are the stronger points of inflection.

I usually select a lr value and then train the model for 10 or 20 epochs. Then I run again lr_find and find_optimal_lr and select another lr rate. I do this because inflection points tend to drift along the graph as you train the model.

As a general rule, you choose the largest lr value possible, specially when you start training the model; however, once your model has been trained after multiple epochs, it is sometimes best to choose a lr in the middle or left side of the graph. You have to play it by ear. AI is full of caveats.

If you select a pair of values for use in a slice, choose two inflection points that are 10 or 100 orders of magnitude apart.

By the way, there is another post in this blog, entitled Selecting Learn Rates in FASTAI, that contains additional details about what the function does.

Hope this helps.



Thank you very much, Jose.


(Tantrum) #9

Thanks a LOT for your notebook, it really helped me, great job!!!
But how can I predict new images using this model? I’v been trying to use this code

but got this error
I found how to predict validation dataset, but I need to predict completely new images

UPD: Solved!


(J. Adolfo Villalobos) #10

Sorry for the delay in answering. I am traveling in Russia, armed with an iPad only. I am glad you found a solution using Torch. If you want to use Fastai to test, the easiest way is to use the power of the Data Block to normalize and prepare the input images. The procedure is as follows:

  1. Go to the folder where the Pascal dataset is kept. I believe it is .fastai/data/Pascal_2007. Create a folder with a name other than Test (Test is used by Pascal to hold the validation images)
    I named it “test_real”. Use this folder to keep all the external images you want to test.
  2. Create a simple databunch to read the test images.
  3. Cycle through the test_dl that was created by the databunch
  4. Display the image without NMS
  5. Predict the test images using NMS
    The complete code is shown below:

1 Like


Edit: There is an easy fix for this, replace following in the function def single_ssd_loss(pbox,plabel,box,label)

gt_clas[1-pos] = len(data.classes)
gt_clas[~pos] = len(data.classes)

I tried to run your notebook, but encountered this error at Cell 24 …something with the Loss Function … what does it mean?


(Tantrum) #13

pip install torch==1.1


(J. Adolfo Villalobos) #14

Hi, I am sorry you are having trouble running the code. It is my fault. I should not have given advice when I did not have a computer at hand or access to the code. Please bear with me for a few days. As soon as I get back to the US, I will consolidate all changes and enhancements into a single notebook. It will then be easy read and test. Regards

1 Like

(Tantrum) #15

How to convert predicted bounding box into the real sized rectangle?
When I predict a bounding box for some image I need first to resize it to (224, 224) to get an ImageBBox object

How to convert tensor([[-0.6111, -0.4920, 0.5020, 0.9398]]) into the rectangle corresponding to the real sized image, not (224, 224)? I want to open my image using PIL and then cut out an area inside of predicted bounding box