Ordering of set_categories

I noticed that in Lesson 7 (Chapter 9), the ordering of the categories is shown opposite to how they are ordered in reality. Is this a mistake? Does it matter? And if not, is it bad practice?

Hey David,

I don’t know if it’s a mistake and indeed it doesn’t matter - tree models always split on a certain value (such that everything above it goes into one node and everything below it goes into the other) so you could reverse whichever columns you’d like and the result would stay the same. What’s important is the ordinality - the fact that this feature is ordered means that the tree can split into for example “large-ish/small-ish” in one split.

Whether it’s a bad practice or not - that depends on who you ask. In my opinion it’s a good idea to make sure that the category order matches human intuition (even if it doesn’t matter computationally).

1 Like