About bounding box localization

Hey @siv,

Well I didn’t go a lot further regarding this heatmap approach although it seemed to work finely. There’s one thing I’ve seen from that notebook which is pretty interesting :

By combining the heatmaps at different scales, we obtain a much better information about the location of the dog.

If you go and visit the link, you’ll see that they actually compute the heatmap for different sizes and then take the geometric average of them.

Regarding localization, I just thought I should look first into dedicated methods (Fast RCNN, SSD) before going further. on heatmaps. And that’s where I am right now, into SSD : there’s a Keras implementation that works just fine here if you respect the environment :slight_smile: It can already localize 20 classes from PASCALVOC and you can also train it on custom classes. It is a very light notebook, and easy to go through. Quite fascinating. Once again, the longest part is to make your training data fit into the expected format, and eventually manual labelling takes time too…

Didn’t go into Fast RCNN yet but should be a step too !

Update on SSD custom training :

After manual labelling on only 50 pictures with 3 classes car, bib_number and license_plate, the training was smooth and results are promising (especially for plate that gets often recognized). car was already a class from the pre-trained network. bib_number is pretty hard… Would require more data I guess !

1 Like

Fascinating thread!

For the question around the tSNE clustering bit, what about computing cluster centroids and checking the distance to the nearest cluster centroid - where that exceeds the current max distance then tag it an outlier, otherwise put it in that cluster. Could do cross-validation to check performance but I’ve found that sort of cluster centroid approach works quite well in applications outside of image processing…

Sure, that makes sense, k-NN can also be used (we’re in a semi-supervised case).
Didn’t try anything further though, I’m on supervised localization now (see SSD post above). But will maybe come back to it

Hey @arnaud
Fantastic work man. Congratulations for getting the SSD working for custom classes.

Can you walk me through the process of getting SSD work with custom data; especially fitting the training data into the expected format. Thanks in advance.

Moreover, I’m in the middle of getting YOLO work, will update once done.

1 Like

Hi @siv, thanks for your message. I’d be very keen on seeing a “tutorial” from you on YOLO then.

Regarding the Keras implementation, if you want to do a complete custom training on new images, here are the steps. I’ll assume you don’t even have a labeled dataset yet.

  • Label a certain amount of your images

    • (OSX) RectLabel writes annotations in .json format. One per image
    • (Ubuntu) labelImg does it in .xml or .json. Back in time I used it with xml, and it was very painful to make it work on OSX so I have an Ubuntu virtual machine to run it (huh).
    • I’d welcome other tools for labelling ? I’m surprised there’s no fancy web-based app to do it. Any one here knows one ? :wink:
  • Turn these annotations into the proper format expected by the SSD implementation. Nothing very hard here, just needs to understand that they save a dict as pickle in that following scheme

{
 "path/to/img1" : [ img1_bbox1, img1_bbox2 ], 
 "path/to/img2" : [ img2_bbox1, img2_bbox2, img2_bbox3 ], 
 ...
}

Here img_bbox is itself a vector containing 4 + C-1 values with C the number of classes. Indeed, for example

img1_bbox_1 = [ x_bl, y_bl, width, height, one_hot_encoding ]

With (x_bl, y_bl) being the coordinates of the bottom left point. And one_hot_encoding is … well the one hot encoding. If you have 2 classes, then it’ll be one value only one_hot_encoding = 0. or 1. but if you have 3 classes it will be two values e.g (1. 0.), or (0. 1.) or (1. 1.) for example.
But remember img1_bbox_1 is a flattened list.

Then you’re… done. You can feed that pickle and the path to your images to the SSD.
I have a function to translate either labelImg (xmls) or RectLabel (jsons) annotations to that pickle format if you want. Can provide tomorrow.

Hope that helps, sometimes (often?) it seems like figuring out what goes in and what goes out is the hardest part… 95% time spent preparing the data, 5% training :stuck_out_tongue:

12 Likes

hi all,
I’m very much new to machine learning.
I’ve created a simple CNN and obtained the weight file referring “building-powerful-image-classification-models-using-very-little-data” on keras blog.
As the second step, I’m planning to use a large image and identify the objects in it using the trained weight file. The identified objects should be marked using a bounding box or any other method. As I know it’s achieved using a sliding window? I’m stuck at this step over a month. Still unable to get the implementation successful. Would you kindly help me with this, please?

Thanks,
Gayan

The two most popular and well known tools I am aware of are ImageJ, which is cross platform, been used in academia and research for years, has great documentation and allows you to script within it in multiple languages, and Sloth, which is cross platform, been used in academia and research for years, has great documentation, etc . . .

Don’t forget figuring out a fancy new loss function to give you an edge over the competition =)

Hi,

Thanks for introducing RectLabel.
Now we support the PASCAL VOC xml format.

Key features:

  • Create a label dialog from settings
  • Settings for objects, attributes and format
  • Support the PASCAL VOC format
  • Layer order for overlapped boxes
  • Zoom in on a point
  • Quick zoom to existing boxes
  • Smart guides for creating and transforming boxes
2 Likes

Hi Gayan,

tl;dr
If you want to run this kind of localization algorithm, then you may want to first use Fast-R CNN or SSD - but not code this on your own.


This is exactly the question algorithms like Fast-R CNN and SSD are trying to solve.

If you go naively into a bruteforce sliding window, you may end up in terrible computational times (looking every possible subset of an image is … long). Indeed, the hard part is to find good bounding box candidates, for that Fast-R CNN will look for ROI (Regions Of Interest), and SSD will look at randomly but well chosen generic candidates. Once you have the bounding box candidates, then you just classify and keep those that have a good enough ‘probability’.

Hope that’s clear. Although I’m not quite sure what your question was :slight_smile:

Does anybody have an explanation for the SSD Single Shot Multibox Detector ? I can’t understand what exactly is happening.
The first thing I’m stuck at is what ‘Default box’ is. What is the output of that model?

(UP) Anything to say about YOLO ? :slight_smile:

@jeremy
Is it possible to do the same technique for telling FastAI what the boundary boxes are now that we’re using PyTorch? Lesson 7 shows how to show Class Activation Maps but doesn’t show to pass in the boundaries if they’re already known for an image.

Thnx

Second this! Really loved watching pt 2 of the videos. Would love to see how to get the bounding box using the fastai framework.

I just finished this week an implementation of SSD with Keras.
I can send you the wiki page I made explaining how it is working in details (I need to review it one time to make sure there are no mistakes).

Thanks Ben!

Just found this as well, did a search bbox in the new repo and found this as well.

@jeremy a little bit stuck. How do I look at the bounding predicted bounded boxes after using learn.fit? Also, how do I pass in an individual image using this method.

Tried this method that worked on the classification task, but didn’t seem to work here.

trn_tfms, val_tfms = tfms_from_model(arch, sz)

img = Image.open(PATH+fn)
im = trn_tfms(np.array(img))
preds = learn.predict_array(im[None])
y = np.argmax(preds)
data.classes[y]

  • Update
  1. The from the learn.predict() method are bounding box positions that come out.
  2. This worked for me as from previous answer on forum to get individual prediction.
io_img = io.imread(img_url)
im = self.trn_tfms(np.array(io_img)/255.0)
preds = to_np(self.learn.models.model(V(T(im[None]))))
  • using it an api so mt look a little different.

Here is the document SSD-Description.pdf (2.4 MB).

Please note that it is not a tutorial on how to implement SSD but a summary of the information I collected while studying the model. I hope it can be still helpful.

Regarding the implementation, I encourage everyone to look at the following repo:

It is extremely useful to understand the details.

2 Likes

Hi there, I would like to ask, as I have implemented SSD too to detect custom object, but how should I crop the image out that detected by the box? Is there a way?