Lesson 2, whales and multi-labels for object detection

The assignment for lesson 2 was “Enter another competition”: http://wiki.fast.ai/index.php/Lesson_2 (I am following the 2017 version of the course).

Following the suggestions from the wiki (http://wiki.fast.ai/index.php/Image_Datasets) I chose for the Right Whale Recognition. It was listed as “easy with some challenges”.

My strategy is as follows:

  1. Label by hand a set of training/validation images with bounding box coordinates.
  2. Resize all images to smallest size in dataset, being careful with scaling issues
  3. Build a bounding box predictor, on resized images from 2) using scaled bounding box coordinates from 1 as training.
  4. crop the predicted bounding boxes
  5. resize the cropped images so that they are able to be used for the VGG16 model
  6. Adjust the VGG16 model by removing the last layers and adding new fully connected layers

I actually went through the trouble of step 1) and built a small label tool in Python to manually label 300 whales’ bounding box coordinates. (I can share the csv if anyone is interested in reusing this!). In addition I built a cropping tool for step 4) that takes these coordinates and the images as input.

I am stuck at step 3). My idea was to divide the image in a 2D-grid and for each cell in the grid label whether part of a whale is in there or not, hence a multi-label problem.. Is this a good approach or just too naive? I inspired myself on the interview with the winners from the competition [1] who mentioned:

“Seems like determining if a part of an image contains something is easier than determining where it is, who knew”

I think this competition is just way to early for someone just finishing lesson 2 and probably a better approach is to use something more advanced like Fast R-CNN’s. Is it better to just hold off until I reach the end of the course?

[1] http://blog.kaggle.com/2016/01/29/noaa-right-whale-recognition-winners-interview-1st-place-deepsense-io/