The assignment for lesson 2 was “Enter another competition”: http://wiki.fast.ai/index.php/Lesson_2 (I am following the 2017 version of the course).
Following the suggestions from the wiki (http://wiki.fast.ai/index.php/Image_Datasets) I chose for the Right Whale Recognition. It was listed as “easy with some challenges”.
My strategy is as follows:
- Label by hand a set of training/validation images with bounding box coordinates.
- Resize all images to smallest size in dataset, being careful with scaling issues
- Build a bounding box predictor, on resized images from 2) using scaled bounding box coordinates from 1 as training.
- crop the predicted bounding boxes
- resize the cropped images so that they are able to be used for the VGG16 model
- Adjust the VGG16 model by removing the last layers and adding new fully connected layers
I actually went through the trouble of step 1) and built a small label tool in Python to manually label 300 whales’ bounding box coordinates. (I can share the csv if anyone is interested in reusing this!). In addition I built a cropping tool for step 4) that takes these coordinates and the images as input.
I am stuck at step 3). My idea was to divide the image in a 2D-grid and for each cell in the grid label whether part of a whale is in there or not, hence a multi-label problem.. Is this a good approach or just too naive? I inspired myself on the interview with the winners from the competition [1] who mentioned:
“Seems like determining if a part of an image contains something is easier than determining where it is, who knew”
I think this competition is just way to early for someone just finishing lesson 2 and probably a better approach is to use something more advanced like Fast R-CNN’s. Is it better to just hold off until I reach the end of the course?