I am wondering if anyone can provide advice on when it’s best to use deep learning for regression tasks on tabular data, rather than using something else like a random forest or linear model?
I don’t have great rules of thumb to add, but I’d say just chuck your data into a Data Bunch, fire up a learner, and see what results you get. I believe Jeremy said in the latest course that he uses DL/NN as his default in 90% of cases, with the remaining going to tree-based methods.
Thanks Kyle, I wonder though if the preference for DL over a random forest is influenced over factors such as how much data is available for training?
I was thinking about this question and then I came across this post so I want to add a few thoughts. I do think it has to do with the number of samples and the number of columns or variables. For a random forest the parameters are just the number of columns. For a neural network the number of parameters to train is the number of columns multiplied by the dimension of a linear layer. If a data set has a lot of columns, a neural network can easily have a lot of parameters to train. And if the number of samples is not large enough, the result of neural network will not be good.
This is just my thought. I hope someone can provide more insights regarding this.