Object Detection


I know that object detection hasn’t been covered yet, and won’t be until part 2, however I wanted to see if I could extend the knowledge from part 1, and combine it with part 2 from previous courses to build an extremely basic object detection (i.e., the first step of object detection from last years course, predicting the bounds of the largest entity in an image).

Unfortunately, I’ve ran into some problems. While my model does technically trian, it never spits out useful predictions - after several days of playing with the arguments, the closest I’ve come is an average error of about 50px (and I’m struggling to repeat this), this is with tuning and dozens of epochs - I’ve confirmed this both using a custom accuracy metric (which I believe represents average px distance), and visually by looking at the images - most of the points trend towards the middle.

I’m using the coco_sample dataset (as the pascal dataset from the previous years course is not available in v1). I’ve tried training over the full set (around 19k images for training, 2k validation) but am currently using just 1k for training, 1k for validation, so I can iterate faster - this has not had an impact on the results.

What I’ve tried so far:

  • toggling between the l1loss loss function prescribed in the previous course and mse loss function that is the default
  • providing (and not providing) a custom_head function (again, as described)
  • providing learn.crit and learn.opt_fn
  • playing with the labelling function to provide topleft, bottomright coords - and top left, width and height coords, and (top, left), (top, right), (bottom, left), (bottom, right)
  • playing with the order of the labelled points - coco states that their data is x first, and fastai requires y first, though it seems fastai’s version of this dataset has already handled this.
  • playing with the amount and type of transforms (e.g., toggling on/off do_flip, max_rotate, max_zoom, pa_affine)
  • clamping the label values to between (-1, 1) - in the case where a rotation or crop may have moved them outside the bounds of the new image.
  • changing the input image size, both the actual size and whether to crop, squish or pad
  • increasing and decreasing the learning rate

Some questions:

  1. How are learn.crit and learn.opt_fn applied to the model, do they override or suplement the loss_func?
  2. What is an acceptable error rate - measured in average px distance - for the coco dataset? I looked at the website but couldn’t understand the metrics in use.
  3. While the API for fastai v1 isn’t complete in terms of object detection - is my current approach “correct”, should I be seeing better results than this?

I’ve posted my notebook (based on the segmentation example from part 1, but obviously heavily modified).