Models based on Randomness vs. "Exhaustiveness"

keryums · November 6, 2017, 2:06am

Hi @jeremy, during last Tuesday’s class you mentioned that “better instantiations of random forests generally add more randomness, not less”. This was in response to the question of whether we could/should try models that include all 2- and 3- way interactions of variables. I believe this also led to the point that a model based on random sampling outperforms a model that considers all combinations of levels + factors (including all factor interactions).

Can you please elaborate on why this is the case? Is it because basing a model on randomness allows it to generalize better? Intuitively, I get why this would be preferable in the case where in the alternative you miss possible interactions, but is this also true when comparing to a model that is “exhuastive” with respect to the variables and interactions it considers? Thanks!

jeremy · November 6, 2017, 10:18am

Take a look at this and the links there, and let me know if that helps: https://stats.stackexchange.com/questions/175523/difference-between-random-forest-and-extremely-randomized-trees

parrt · November 6, 2017, 3:48pm

That links to an interseting paper