Well I didn’t go a lot further regarding this heatmap approach although it seemed to work finely. There’s one thing I’ve seen from that notebook which is pretty interesting :
By combining the heatmaps at different scales, we obtain a much better information about the location of the dog.
If you go and visit the link, you’ll see that they actually compute the heatmap for different sizes and then take the geometric average of them.
Regarding localization, I just thought I should look first into dedicated methods (Fast RCNN, SSD) before going further. on heatmaps. And that’s where I am right now, into SSD : there’s a Keras implementation that works just fine here if you respect the environment It can already localize 20 classes from PASCALVOC and you can also train it on custom classes. It is a very light notebook, and easy to go through. Quite fascinating. Once again, the longest part is to make your training data fit into the expected format, and eventually manual labelling takes time too…
Didn’t go into Fast RCNN yet but should be a step too !
After manual labelling on only 50 pictures with 3 classes car, bib_number and license_plate, the training was smooth and results are promising (especially for plate that gets often recognized). car was already a class from the pre-trained network. bib_number is pretty hard… Would require more data I guess !
For the question around the tSNE clustering bit, what about computing cluster centroids and checking the distance to the nearest cluster centroid - where that exceeds the current max distance then tag it an outlier, otherwise put it in that cluster. Could do cross-validation to check performance but I’ve found that sort of cluster centroid approach works quite well in applications outside of image processing…
Sure, that makes sense, k-NN can also be used (we’re in a semi-supervised case).
Didn’t try anything further though, I’m on supervised localization now (see SSD post above). But will maybe come back to it
Hey @arnaud
Fantastic work man. Congratulations for getting the SSD working for custom classes.
Can you walk me through the process of getting SSD work with custom data; especially fitting the training data into the expected format. Thanks in advance.
Moreover, I’m in the middle of getting YOLO work, will update once done.
Hi @siv, thanks for your message. I’d be very keen on seeing a “tutorial” from you on YOLO then.
Regarding the Keras implementation, if you want to do a complete custom training on new images, here are the steps. I’ll assume you don’t even have a labeled dataset yet.
Label a certain amount of your images
(OSX) RectLabel writes annotations in .json format. One per image
(Ubuntu) labelImg does it in .xml or .json. Back in time I used it with xml, and it was very painful to make it work on OSX so I have an Ubuntu virtual machine to run it (huh).
I’d welcome other tools for labelling ? I’m surprised there’s no fancy web-based app to do it. Any one here knows one ?
Turn these annotations into the proper format expected by the SSD implementation. Nothing very hard here, just needs to understand that they save a dict as pickle in that following scheme
With (x_bl, y_bl) being the coordinates of the bottom left point. And one_hot_encoding is … well the one hot encoding. If you have 2 classes, then it’ll be one value only one_hot_encoding = 0. or 1. but if you have 3 classes it will be two values e.g (1. 0.), or (0. 1.) or (1. 1.) for example.
But remember img1_bbox_1 is a flattened list.
Then you’re… done. You can feed that pickle and the path to your images to the SSD.
I have a function to translate either labelImg (xmls) or RectLabel (jsons) annotations to that pickle format if you want. Can provide tomorrow.
Hope that helps, sometimes (often?) it seems like figuring out what goes in and what goes out is the hardest part… 95% time spent preparing the data, 5% training
hi all,
I’m very much new to machine learning.
I’ve created a simple CNN and obtained the weight file referring “building-powerful-image-classification-models-using-very-little-data” on keras blog.
As the second step, I’m planning to use a large image and identify the objects in it using the trained weight file. The identified objects should be marked using a bounding box or any other method. As I know it’s achieved using a sliding window? I’m stuck at this step over a month. Still unable to get the implementation successful. Would you kindly help me with this, please?
The two most popular and well known tools I am aware of are ImageJ, which is cross platform, been used in academia and research for years, has great documentation and allows you to script within it in multiple languages, and Sloth, which is cross platform, been used in academia and research for years, has great documentation, etc . . .
Don’t forget figuring out a fancy new loss function to give you an edge over the competition =)
tl;dr
If you want to run this kind of localization algorithm, then you may want to first use Fast-R CNN or SSD - but not code this on your own.
This is exactly the question algorithms like Fast-R CNN and SSD are trying to solve.
If you go naively into a bruteforce sliding window, you may end up in terrible computational times (looking every possible subset of an image is … long). Indeed, the hard part is to find good bounding box candidates, for that Fast-R CNN will look for ROI (Regions Of Interest), and SSD will look at randomly but well chosen generic candidates. Once you have the bounding box candidates, then you just classify and keep those that have a good enough ‘probability’.
Hope that’s clear. Although I’m not quite sure what your question was
Does anybody have an explanation for the SSD Single Shot Multibox Detector ? I can’t understand what exactly is happening.
The first thing I’m stuck at is what ‘Default box’ is. What is the output of that model?
@jeremy
Is it possible to do the same technique for telling FastAI what the boundary boxes are now that we’re using PyTorch? Lesson 7 shows how to show Class Activation Maps but doesn’t show to pass in the boundaries if they’re already known for an image.
I just finished this week an implementation of SSD with Keras.
I can send you the wiki page I made explaining how it is working in details (I need to review it one time to make sure there are no mistakes).
@jeremy a little bit stuck. How do I look at the bounding predicted bounded boxes after using learn.fit? Also, how do I pass in an individual image using this method.
Tried this method that worked on the classification task, but didn’t seem to work here.
trn_tfms, val_tfms = tfms_from_model(arch, sz)
img = Image.open(PATH+fn)
im = trn_tfms(np.array(img))
preds = learn.predict_array(im[None])
y = np.argmax(preds)
data.classes[y]
Please note that it is not a tutorial on how to implement SSD but a summary of the information I collected while studying the model. I hope it can be still helpful.
Regarding the implementation, I encourage everyone to look at the following repo:
Hi there, I would like to ask, as I have implemented SSD too to detect custom object, but how should I crop the image out that detected by the box? Is there a way?