Hi @cudawarped, I haven’t tried the methods you mentioned as I’m still getting into the the open-set topics.
It makes sense that visually similar classes tend to get confused. The reason for that is general image classification algorithms are trained to recognize classes that are visually different from each other (course-grained classification) and not visually similar to each other (fine-grained classification). There’s model modifications that we can do to tackle each type of problem but I’m not sure if there’s a general approach that handles both.
In the cats and fish example, we see a confusion between the two classes due to another characteristic of current methods: the networks we train sometimes learn simple local features that don’t generalize well. That is due to the fact that we accept any supervision signal to improve absolute model accuracy instead of trying to find features that are more global. An interesting article on this topic
If you want to improve the accuracy, I’d suggest trying self-supervised pre-training or self-supervision as an additional network task because it’s found to improve generalization.