I am going through today’s notebook. I would like to ask something to figure out whether I’ve understood it correctly or not.
Here are some statements I tried to make, please correct me and/or add missing points.
1) Dropping columns based on the feature importance might get rid of collinear features.
2) Getting rid of collinear features will increase the weight of their collinear counterparts which are left in the model. Since both sets of features were giving similar signals, this was causing weights on that direction to be divided among them.
3) After having left with purer features (like more orthogonal vectors) we can make better interpretations on each individual features.
4) Dropping features may give us better results, because having left with purer features (more signal/noise ??) and having a threshold as max_depth, our model might then use this new subset of features to give a better generalization due to it’s simplicity. (this part is more clear with data leakage example given during the class, sometimes only a single column can map the desired relationship but additional features might add noise, but if this was the case wouldn’t RandomForest stop at a single split on this feature ?).
5) Open question: We often desire simpler models for the sake of better generalizing a phenomena, does that mean dropping columns is always better if there is not much signal in some features or should one squeeze every bit of information from those features of course without over fitting ?