About bounding box localization

bennnun · January 12, 2018, 10:47am

I just finished this week an implementation of SSD with Keras.
I can send you the wiki page I made explaining how it is working in details (I need to review it one time to make sure there are no mistakes).

edwincv0 · January 12, 2018, 7:00pm

Thanks Ben!

Just found this as well, did a search bbox in the new repo and found this as well.

github.com

fastai/fastai/blob/2d399f7cbca604964329cb38fc68569ba32af27a/courses/dl1/fish.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fisheries competition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we're going to investigate a range of different techniques for the [Kaggle fisheries competition](https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring). In this competition, The Nature Conservancy asks you to help them detect which species of fish appears on a fishing boat, based on images captured from boat cameras of various angles. Your goal is to predict the likelihood of fish species in each picture. Eight target categories are available in this dataset: Albacore tuna, Bigeye tuna, Yellowfin tuna, Mahi Mahi, Opah, Sharks, Other \n",
    "\n",
    "You can use [this](https://github.com/floydwch/kaggle-cli) api to download the data from Kaggle."
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

edwincv0 · January 14, 2018, 3:18pm

@jeremy a little bit stuck. How do I look at the bounding predicted bounded boxes after using learn.fit? Also, how do I pass in an individual image using this method.

Tried this method that worked on the classification task, but didn’t seem to work here.

trn_tfms, val_tfms = tfms_from_model(arch, sz)

img = Image.open(PATH+fn)
im = trn_tfms(np.array(img))
preds = learn.predict_array(im[None])
y = np.argmax(preds)
data.classes[y]

edwincv0 · January 15, 2018, 4:17pm

Update

The from the learn.predict() method are bounding box positions that come out.
This worked for me as from previous answer on forum to get individual prediction.

io_img = io.imread(img_url)
im = self.trn_tfms(np.array(io_img)/255.0)
preds = to_np(self.learn.models.model(V(T(im[None]))))

using it an api so mt look a little different.

bennnun · January 17, 2018, 6:24am

Here is the document SSD-Description.pdf (2.4 MB).

Please note that it is not a tutorial on how to implement SSD but a summary of the information I collected while studying the model. I hope it can be still helpful.

Regarding the implementation, I encourage everyone to look at the following repo:

It is extremely useful to understand the details.

chingjunehao · January 20, 2018, 7:07pm

Hi there, I would like to ask, as I have implemented SSD too to detect custom object, but how should I crop the image out that detected by the box? Is there a way?

bennnun · January 22, 2018, 12:42am

@chingjunehao: The output of the SSD detector is a tensor containing N_box “box tensors”. Each “box tensor” contains the data relative to one box generated by the model, the data being (your data order might be different) [x_center, y_center, width, height, x_center_variance, y_center_variance, width_variance, height_variance, class1_score, …, classN_score, x_center_offset, y_center_offset, width_offset, height_offset].
The scores and offsets are the parameters that SSD predicts and the other parameters are fixed.
So in order to crop your image, you need to follow the steps below:

Have a decoder function to compute the predicted position of the boxes in pixels.
Filter the boxes to keep only the ones with the highest score (non-max suppression)
Finally use the positions of the remaining boxes to crop your image.

div · February 1, 2018, 5:15am

Interesting thread!

I’m using a combination of inception + ssd to train my own custom dataset. After weeks of painful annotations, I finally got to a good level of accuracy on the object categories.

Strangely though, I can see a consistent pattern in the bounding box predictions among all classes. It’s like they are all expanded to the right side alone to include extra space, yet a tight bound in the left (esp left -bottom). check out the images below… and I’m not exaggerating - but EVERY prediction comes out this way… so I’m pretty sure it has something to do with my data/ a possible bug.

Capture

Just wondering if anyone has faced a similar issue or has any ideas on why this might be happening? Any pointers would be great!

alexe · February 2, 2018, 7:24pm

Yes, this implementation is amazing. Reading the documentation and the code for the ground truth encoding process helped my understanding a lot. It’s actually a lot better (more comprehensive, better documentation, all original models provided) than the one linked to further up in this thread:

pawan_1 · February 15, 2018, 6:24am

hey @arnaud can you please provide the labelImg and RectLabel functions.

punnerud · February 15, 2018, 7:19am

Something like this you are looking for to save part of the image?

from PIL import Image
img = Image.open('saved_image.png')
print(img.size)
croppedIm = img.crop((490, 980, 1310, 1217))
croppedIm.save('crop.png')

(late answer)

hud · January 8, 2019, 7:22am

Hello @arnaud do you have the function that changes json annotations to pickle ?

prabhat7298 · February 8, 2019, 2:25pm

Hello All
So after reading this I tried object localization on my own data containing single object per image without any class i.e. I’ve to predict only bounding boxes of object without classifying them as object are very different. Most of the objects are grocerry item on a white background. I tried adding a simple regressor head ahead resnet and mobilenet but I’m in vain as the results aren’t good. Also, some of my test images contain two adjecent objects like a pair of shoes whereas in my training data there’s only one shoe. So, this is one thing I’m not able to tackle. Can somebody help?