I’ve been going through Jeremy’s ML videos, and the discussion of bootstrapping with random forests got me curious. When is bootstrapping the right thing to do?
For example, I don’t think we use bootstrapping at all in the DL course, but I can imagine a training loop that bootstraps minibatches, that is, draws them with replacement. (I’m not quite sure what an epoch would mean in the context of bootstrapping though: do you keep bootstrapping until you’ve seen every training example, or until you’ve seen
len(x_trn) samles, or…?)
I think that sounds like a cool idea! I have thought about this as well and at some point I will probably try to train a lot of weak NN learners and see how it goes Though I don’t expect to get very far with this.
As for why we use bootstrapping with random forests - I think we want to decrease the bias of our trees. If we didn’t vary the data at all and there was no randomness in picking columns to look at, each tree would learn the same thing and we wouldn’t get a chance to look at a lot of the signal that our dataset contains.
I suspect that even with massive datasets (which translates to tree depth) we only get so many splits and the expressiveness of a tree is substantially smaller than that of a NN, hence the emphasis on reducing bias. Once we have a lot of trees and effectively have reduced the bias, we ensemble them to decrease the variance of each single tree, so both the steps work in tandem.
I am also quite new to all this - tried reading this just now which is what Jeremy shared in the comments section on the ML videos - really, really great read. I wanted to find a bit more on the theory to share with you but there is nothing there on why bootstrapping is the way to go with random forests. However I suspect the paper (the top link on this page I believe) might talk to the theoretical underpinnings.
Sorry for typing this out in such a hurry but gotta run - maybe the links will be useful to you though
Cool, thanks, I’ll check out the links! Another link I stumbled on that seems very interesting: https://stats.stackexchange.com/questions/26088/explaining-to-laypeople-why-bootstrapping-works
Just to clarify one part of my question: I’m curious if it would make sense to bootstrap while training, even for just a single NN (so not as a means towards ensembling NNs that have each been trained on somewhat different versions of the training data). Not sure if this makes sense/would end up working any differently to the normal approach, but to be as bootstrappy as possible, imagine doing away with the concept of epoch entirely: each minibatch is freshly sampled with replacement from the training set, and you just keep sampling and taking steps of gradient descent until you’re satisfied.
This has been studied and proved not to work as well as cycling through epochs. See this discussion for more details: https://stats.stackexchange.com/questions/242004/why-do-neural-network-researchers-care-about-epochs, as well as this paper: https://arxiv.org/pdf/1510.08560.pdf