Object detection requirements


I am trying to build an object detection model (based on one of the Yolo or SSD architectures). Basically, my task is to localize i.e draw bounding boxes around specific fields of interest on documents. So, for example, I want to be able learn a model that can draw bounding boxes around first-name, last-name, DOB on a identification document for example.

The following are my questions:

  1. What kind of annotated dataset I would need to be able to train this model? Specifically, do I need to annotate background bounding boxes too in my data?
  2. What is a good size dataset? 5000 annotated documents, 10000?
  3. Can I use a pre-trained object detection model that has been trained on a large open-dataset and refine it using my data? Similar to what we typically do with image classification using transfer learning?
  4. Does fastai support this kind of transfer learning for object detection?

Would love some thoughts from folks who have experience in this area or any pointers to existing work that does this.

See lessons 8 and 9, they’re specifically about this topic.

I believe you could get reasonable results with initial dataset of a hundred good quality ID photos (i.e. if you have varying designs year to year, etc), annotating them manually, and then applying custom augmentation which would fill in various types of data, coupled with standard photo filters such varying lighting conditions, flash gloss, etc.

1 Like