Lesson 7 - Official topic

francescolost · February 18, 2021, 1:01pm

I agree that max_card of 1 is weird, and it took me a while to figure out what was going on: setting max_card to 1 I got 51 categorical variables, setting it to 9000 I got 60 categorical variables. I then started investigating some of the 51 I got in the first case and I found out that they all had > 1 category.

If you look at the source code for the cont_cat_split(...) function (e.g. here), you see where the trick is: a variable is considered continuous if it has integer values and > max_card occurrences or if it has float values. In the case of the 51 categorical variables, they are all string-valued!