In lesson4 Jeremy mention that:
Choosing categorical vs. continuous variable is a modeling decision you get to make. In summary, if it is categorical in the data, it has to be categorical. If it is continuous in the data, you get to pick whether to make it continuous or categorical in the model.
I would want to know what’s the good practice to choose the variable type. Normally, in ML models I use dtype==‘object’ to make the decision. After seeing the power of entity embedding, I am thinking I should use the cardinality number to help to make the decision. Let say I have 10k samples, if a column with cardinality<=100 I will treat it as categorial variable.
Is it a good practice? what will the best cardinality number for this problem? thx.