I've been working through Lesson 7, and have a pretty good understanding of using annotations - specifically the bounding boxes - to train the network, and to predict bounding boxes when a single instance of a category/class appears in an image.
However, I would like to try and extend this, so that I can train the network to look for:
1) multiple categories/classes in a single image (for example, 2 or more different types of fish in a single image - although not necessarily fish); and
2) their respective bounding boxes
so that the final prediction outputs, for example 2, or 3, or 5 bounding boxes and the (different) type(s) of classes identified in the image. Any given image may have zero or more different categories/classes.
However, I'm struggling a little trying to tie it all together, and how to build the architecture such that an arbitrary number of classes and bounding boxes can be in a single image, and wondered if anyone had any pointers in the right direction.
Finally, I've been attempting to read an understand some papers (for example on Faster R-CNNs, segmentation), which seem to do this - perhaps this is a way forward (not being a math expert, it's taking me a while to get my head around it all, and see how it can be coded in Keras!).
Thanks for any pointers/discussion,