I am having some difficulty understanding when to subsample. My understanding was that using this technique was a way of speeding up playing around with models prior to actually fitting a thought-out one. The payoff for the speed is that the no single tree is using all the data, and therefore,worse scores.
My question is, can/should this be applied to fitting my final model? I am running into memory issues (that subsampling would help with) but from what I’ve seen in the lecture and other places is that the set_rf_samples() was also reset before fitting the final model.
Also, would it make sense to subsample, and then use a large number of estimators to try to get the model to see all (or at least most) of the data?