I have a conceptual question… What’s the impact of the number of categories on what is learned by a model?
Let’s say I have a dataset that has 5 categories… A, B, C, D, and E, but what I’m really interested in is A vs. “not-A”.
I could either create a model with 5 categories [A,B,C,D,E], or one with 2 categories [“A” vs “B/C/D/E”].
Can someone provide insight (or links to blogs/publications) that will help me understand what things I should be considering in making this choice? I know this is somewhat of an empirical question (can just compare outputs of the two models for sensitivity in detecting ‘A’), but interested if anyone can provide some theory/concepts as well.