ML lesson 3 - categorical variables feature importance

(Rick) #1

I just finished the lesson 3 of ML course and I’ve finally got the principle for feature importance calculation technique.

Once we’ve trained our model and made prediction, for each variable, we’ll shuffle it and calculate the prediction accuracy with R² score.
My question is : for categorical variable, once we transformed cat.code column, do we shuffle them the same way as we do for continuous variables ? (may sound obvious but the more I learn, the more I doubt of my logic)

At the end of the lesson, Jeremy assigned us to try to change the fiProductClassDesc variable to ordinal to check any eventual difference in feature importance. But I didn’t see any change by my side, so I’ve been questioning is there anybody that had different result ?

For me I’ve always checked correlations between non-continuous variables with chi² test but I’ve never managed to calculate the feature importance of these, is the way proposed by Jeremy in the course an available method for feature importance calculation for categorical and ordinal variables ?