Training-Time/Dynamic Data Creation ... possible in FastAI?

NOTE: not entirely sure to put this here, in Applications, or Theory, since it feels like it touches all of them. This is part question, part open-discussion.

I’ve been working on repurposing RetinaNet as a pilot-consciousness detector today and there’s a thought I want to throw up here before I go to bed. Can the library support run-time data creation? – Would it be a nightmare to implement?

ie: could we create temporary training examples inside the model/learner while it’s running?

What I mean is: I’m thinking of using the bounding-boxes created by RetinaNet (in Keras right now) to crop images and feed those into a classifier. In some cases you have (well, I do) a training-set image with more than one example of the thing you’re detecting (in my case: two views of a person). The bounding-box part of the model will identify two boxes in those cases. Since I want to crop those images and feed them into a classifier, I have an issue if I’m training on file 0001.jpg and I have two parts of it to crop.

I see 3 immediate options at this point:

  1. Throw both cropped sub-images through the classifier and take the average of their predictions (is this possible? feels like it’d require modifying the underlying code)
  2. Take the first or most-confident bounding-box “prediction” and just send that through
  3. Treat the 2 sub-images as separate training-images with the same label.

Now… there’s another way entirely to go about this, which may be more in line with the way is currently handling things, and that’s to run a separate pass with the ‘cropper’ network and save all cropped images to a tmp/ folder (keeping track of labels), and then running that cropped/modified/dynamic dataset through the classifier.

My mind likes the idea of handling the crops on the fly, but I think a more implementable solution is creating the tmp/ folder.

Anyway, there are some thoughts, I’ll probably get to work on one of them tomorrow / this week.

Keras RetinaNet – Delft Robotics
Focal Loss for Dense Object Detection (RetinaNet paper)
Here’s the notebook I was working on, for visual reference. Be warned, it’s not meant to be pretty, but does show the ‘discovery path’, and examples towards the bottom.


I’d suggest starting with the simple approach (save the crops) and see how it goes. If it works well, you could then try the other approaches. The recent Kaggle product identification competition had multiple images per product - it might be a good source of ideas for how to combine things.

Hi Wayne, just curious - have you looked at YOLOv2 for one step detection and classification? i just started looking at RetinaNet, speed being not that critical for me while the accuracy is

Ah there’s a speed/accuracy tradeoff? YOLOv2 is actually the first detection model that I looked at. I started using RetinaNet because of Lex Fridman’s Bording Detector – it looked pretty fast in the demo gif, & I didn’t want to mess with a new DL library, Dark Net, for YOLOv2 (though I see there may be TF implementations now).

I’m just about to put up a dev demo. Right now it averages 0.32 seconds 1st stage (TF retinanet), 0.28 seconds 2nd stage (PT resnset34) – including image processing steps.

I’m hoping the 1st stage RetinaNet model will speed up after finetuning – right now I’m pulling the highest-confidence prediction, and it’s predicting the 100+ or so COCO dataset classes, instead of the 1 ‘pilot’ class I need. May or may not be delusional thinking :wink:

If I run into some serious barriers, I may revisit other detection models. This project’s been an enormous software-dev learning experience.

Oh, and @jeremy: I settled on a solution for the interstage dataset. For training, images are saved to a tmp/ folder and along with a new CSV file. During runtime, the images are sent directly to the classifier model. And I think with some tweaks this can scale to any number of output images (so 001.jpg --> 001a.jpg, 001b.jpg, etc).

Just finished debugging the ‘semi-supervised’ mode for the 1st stage today. My dataset’s ~7,600 images, so I want to be able to take a break part-way through. Also, instead of saving to a ‘reject’ folder, manual-labelling is done in terminal because the X,Y coords are displayed. I’d like to share two examples below:

In this example, box no.2 is selected as ‘correct’

0’ signals manual labelling. X,Y coordinates are gotten by hovering over the window.


sounds interesting, like the idea of intermediate dataset - i need something like this - my solution might require two stage trianing though i’m trying to avoid this - maybe with a help with RetinaNet now - so your results are encouraging… though in my case, i have a non-COCO objects and they are small so starting to prep the data as we speak…
and yes, there are some decent Keras/TF repos with Yolov2 - used one of them successfully - well, in a sense that it found the objects but the bboxes were not that accurate.
Oh, and another advantage of YOLO is a small model, the RetinaNet (ResNet50 based) is half gig

playing with your NB right now - found it quite useful, saved me quite some time, thanks!

btw, out of the box i see a lot of FP detected…
(also finally a legit reason to show off my pets - one of them is considered to be a person, which comes as no surprise to me…)

1 Like

Just uploaded a demo video for the current build of my G-LOC Detector:

It’s very cool to see how this is coming together! :slight_smile: