About bounding box localization

arnaud · June 11, 2017, 5:56pm

Hi @siv, thanks for your message. I’d be very keen on seeing a “tutorial” from you on YOLO then.

Regarding the Keras implementation, if you want to do a complete custom training on new images, here are the steps. I’ll assume you don’t even have a labeled dataset yet.

Label a certain amount of your images
- (OSX) RectLabel writes annotations in .json format. One per image
- (Ubuntu) labelImg does it in .xml or .json. Back in time I used it with xml, and it was very painful to make it work on OSX so I have an Ubuntu virtual machine to run it (huh).
- I’d welcome other tools for labelling ? I’m surprised there’s no fancy web-based app to do it. Any one here knows one ?
Turn these annotations into the proper format expected by the SSD implementation. Nothing very hard here, just needs to understand that they save a dict as pickle in that following scheme

{
 "path/to/img1" : [ img1_bbox1, img1_bbox2 ], 
 "path/to/img2" : [ img2_bbox1, img2_bbox2, img2_bbox3 ], 
 ...
}

Here img_bbox is itself a vector containing 4 + C-1 values with C the number of classes. Indeed, for example

img1_bbox_1 = [ x_bl, y_bl, width, height, one_hot_encoding ]

With (x_bl, y_bl) being the coordinates of the bottom left point. And one_hot_encoding is … well the one hot encoding. If you have 2 classes, then it’ll be one value only one_hot_encoding = 0. or 1. but if you have 3 classes it will be two values e.g (1. 0.), or (0. 1.) or (1. 1.) for example.
But remember img1_bbox_1 is a flattened list.

Then you’re… done. You can feed that pickle and the path to your images to the SSD.
I have a function to translate either labelImg (xmls) or RectLabel (jsons) annotations to that pickle format if you want. Can provide tomorrow.

Hope that helps, sometimes (often?) it seems like figuring out what goes in and what goes out is the hardest part… 95% time spent preparing the data, 5% training