Why do I want to use ridge regression for model ensembling? Could anyone please shed any light on the below - does it make sense or am I completely off track here?
My reasoning is that we want to use ridge regression as we will have correlated features. It is reasonable to assume high degree of correlation between outputs of each model. This might lead to the coefficients being way to high or way to low for the contribution of each feature due to the fact that other, correlated features will compensate.
We don’t care about the coefficients because we don’t want to analyze the model - we really don’t care how each feature contributes to the final prediction. All we care is whether the model will generalize to new data or not. And this is why we use ridge regression - we ‘regularize’ the coefficients to make them be what they would be should the features be orthogonal to improve the ability for our ridge regression model to generalize.
What prompted me to think about this is the following interview: