I understand that regression is typically used when your labels are continuous, and that classification is typically used when your labels are categorical.

My question is: Even when your labels are categorical, such as age (number of years old), regression is still typically used. What’s the rule of thumb for choosing between classification and regression?

My current thoughts are that:

If you have a small number of categories, (such as ages with years 1-5 only), then classification may be a better method.

If your labels are unbound, (age may be interpreted as unbound because someone could break the record for being alive the longest), then use regression.

Speaking strictly as a non-expert, I would use regression for labels that have an order, whether continuous or not. Then apply boundaries that demark the output categories you are looking for. Otherwise, by using categories, you throw away a significant characteristic of the function you want to model. Generally for ordered labels, if you change a continuous input parameter, the output moves to an adjacent category, and you’d want to make it easier for the model to find that relationship.

(In other eras, I might have argued that the labels need to represent a linear scale, that 8 means twice as much of something as 4. But I think the non-linearities in ML models make that point moot.)

So my bias is to use regression whenever possible (ordered labels). And to pick a final activation function that maps to the range of the labels, as Jeremy shows in one of the lessons. It’s an easier function for the model to learn.

Once I even switched cancer/no-cancer categories to a [0,1] regression, interpreting the input as number 0 or 1, and the output as a probability of cancer. With the right threshold, it worked a bit better than using categories. That was perhaps an unusual case because we can interpret a slide as having an “amount of canceriness”. That interpretation may not apply as well to other types of binary categories.

As for unbound labels, they are unbound only potentially, not in the specific training set. So it does not make sense to me to use it as an additional regression criterion beyond orderedness. Also, in my limited experience, regression models do very poorly when asked to predict outside the label range. OK within, very inaccurate outside. YMMV of course, and I would like to hear what you discover when you try it.

Well, that little essay sounds more authoritative than I have credentials for. I’d love to see more perspectives here.