When is Deep Learning preferable over Machine Learning using Structured Data?

tbrueck · January 29, 2019, 9:21am

Hey everyone,
I remember Jeremy saying that for Deep Learning you can almost always say the more data the better. I also found out that with small amounts of data Machine Learning techniques tend to be more accurate. Is there a hand of rule at which point Deep Learning is definitely preferrable over Machine Learning?

cstorm125 · January 29, 2019, 9:29am

I used to believe that tree-based algorithms such as random forest perform better with smaller dataset. Using embeddings and fully-connected layers like in fastai deep learning course changed my mind. Now I always start with both random forest AND deep learning. In most cases, the deep learning approach is at least as good as random forest.

fabris · January 29, 2019, 10:06am

Hi Torben, don’t forget deep learning is actually a sub-field of machine learning. The neural nets are nowadays so popular because they require less data cleaning and almost no feature engineering. Everything is in the arch design and training strategies/tricks. Structured data as tabular data or even worse graphs have been studied by dl researchers for a while. So far there is no widely accepted approach. To answer your question, well it depends on your specific task and your data since I am not aware of any rule of thumb.

Tchotchke · January 29, 2019, 2:24pm

Using fastai 0.7 I did some comparisons between LightGBM and the structured data classifier in fastai (which uses entity embeddings) on the TalkingData Kaggle competition. I found that LightGBM outperformed fastai over the whole dataset, though the top 10% or so of the predictions from fastai were better. Not that these results are on a very large dataset - over 100 million observations.

One important note - LightGBM trained significantly faster - I forget the exact numbers, but it was around an order of magnitude (i.e., 10x) faster than the deep learning model and it didn’t require a GPU.

If you haven’t read the paper, I’d suggest looking at Entity Embeddings of Categorical Variables, which discusses how the authors used the deep learning approach to do really well in the competition.

I think we can’t say one way or another which approach is better and under what circumstances, in large part because the deep learning approach is relatively new and I haven’t seen it used much on Kaggle yet. It seems like people still tend to favor XGBoost or LightGBM, which historically have done really well.

tbrueck · January 30, 2019, 4:41pm

Thanks guys, these are some great replies. I was always thought the same way that you probably just have to try both and that there is no common used approach. Thanks for your work!

ctwardy · February 5, 2019, 3:57pm

Deep Learning’s superpower is automated feature engineering. One of my mentors said that most of the science in machine learning was deciding what features to measure in the first place. Better features usually dominated better algorithms.

Once you have the key features, you may be able to find a lighter-weight algorithm that does as well or better – as I think FastText did with results of Word2Vec – but esp. in rich domains like images and text, the features are likely to be complex and maybe more easily discovered by deep nets than engineered from first principles.

In feature-poor environments like timeseries forecasting, it’s still amazingly hard to beat simple exponential smoothing. Only recently did someone finally do so, using a combination of hand-tuned exponential smoothing and RNNs. The pure machine learning methods (including neural nets) were trounced.

Amitjindia98 · February 14, 2019, 10:26am

Machine Learning is set of algorithms that parse data, learn from them, and then apply what they’ve learned to make intelligent decisions.Deep Learning is a subset of Machine Learning that achieves great power and flexibility by learning to represent the world as nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones.