I’ve been experimenting with synthetic data to boost my object detection model performance. I’m using IceVision to do the heavy lifting (on top of fastai), and in this blog post I talk about some of the iterations of my process, how I’m thinking through the use of synthetic data and my data-centric approach.
There’s lots of knobs to fiddle with in machine learning and even more so when synthetic data is in the mix. I’ve really enjoyed getting a better intuition of what makes sense for my use case, and — with a current best COCO score of 86% for my model — also some fairly decent results as well!
I started by creating some generally applicable synthetic images. I then figured out (with much help from FiftyOne) that these new synthetic images didn’t really help much with the examples where the model was struggling. So I created a (much smaller) set of synthetic images which were closer to the hard examples.
This gave a really nice boost to my model performance and overall felt like a more focused way of tackling the areas where my model was weakest.
Feedback / suggestions / critiques welcomed!
(And thanks to @farid for guiding me through this process. It’s been a journey!)