I think that sounds like a cool idea! I have thought about this as well and at some point I will probably try to train a lot of weak NN learners and see how it goes Though I don’t expect to get very far with this.
As for why we use bootstrapping with random forests - I think we want to decrease the bias of our trees. If we didn’t vary the data at all and there was no randomness in picking columns to look at, each tree would learn the same thing and we wouldn’t get a chance to look at a lot of the signal that our dataset contains.
I suspect that even with massive datasets (which translates to tree depth) we only get so many splits and the expressiveness of a tree is substantially smaller than that of a NN, hence the emphasis on reducing bias. Once we have a lot of trees and effectively have reduced the bias, we ensemble them to decrease the variance of each single tree, so both the steps work in tandem.
I am also quite new to all this - tried reading this just now which is what Jeremy shared in the comments section on the ML videos - really, really great read. I wanted to find a bit more on the theory to share with you but there is nothing there on why bootstrapping is the way to go with random forests. However I suspect the paper (the top link on this page I believe) might talk to the theoretical underpinnings.
Sorry for typing this out in such a hurry but gotta run - maybe the links will be useful to you though