How well can a vision learner categorize intersecting data types?

Let’s say I want to label these three cases:

  1. Pictures of dogs.
  2. Pictures of cats.
  3. Pictures where there are a mixed group of cats and dogs. Maybe there’s one of each, maybe there are more.

Is this something that a vision learner model could do with a >80% success rate?

Hi @lefokingrad, yes, this should certainly be possible and there are multiple ways to achieve this. You could simply try multi (3) class classification, or you could try object detection (with two classes) as a first stage and add logic on top to distinguish between groups and or mixes

1 Like

Thank you @lucasvw!