Is this course still relevant? (From maker's point of view)

I had a rough look at the contents of this course and have found that it deals with Random Forest and Deep Learning. So for those who have completed a Deep learning course, this course is actually a Random Forest course.
And since we can now do deep learning using tabular data, so why use Random Forest?

Jeremy said that when he tries to approach problems himself, then some problems are better on tabular deep learning, and some problems are better on Random Forest. But in this mention, I sort of feel that the difference would be an accuracy 1% or something. If not for the purpose of winning the competition, itā€™s not a big difference, is it?
If the purpose of using deep learning is making, it would be inefficient to spend time learning Random Forest or other machine learning methods. What do you think?

4 Likes

Random forests and GBMs are still powerful baselines for various tabular tasks, I believe that they are still worth learning.

2 Likes

Actually when I was doing Kaggle competitions on tabular data last Fall, lightGBM was beating deep learning most of the time.

1 Like

Iā€™m modelling an classification task with 150k rows of private tabular data and the best F1 score I can achieve with deep learning is 0.25. Both Random Forest and Gradient Boosting get me above 0.4, so Iā€™d say this Course deserves to be around for a while :wink:.

Also, the lessons on explainability are extremely useful and important for many commercial applications. Deep Learning models arenā€™t as readily explainable (yet).

2 Likes

I am grateful to the people who answered my question so far. But, then there is an inverse question. If Random Forest or something is better than deep learning for tabular data in most case, is deep learning for tabular data ever useful? Then where should it be used?

Traditional ML like Random Forests teach you to think about feature engineering (which is one of the most important aspects of learning). Humans can relate to engineered features as they rely on domain expertise and intuition. With deep learning feature engineering is usually considered a function of the network architecture (i.e. you have to find the appropriate architecture for your problem). It is more of a search problem (i.e. trial and error) than common sense (i.e. these features make sense) and the former approach does not lend itself to making models explainable. If you want to be able to explain the output of a model it is worth investing time in traditional ML.

1 Like

Applying deep learning on computer vision and NLP wasnā€™t always a hot thing, it was only until recent years that it became practical and popular. What I am trying to say is that in the past deep learning wasnā€™t very ā€˜usefulā€™ in the above two fields, but it is only because researchers persisted that we managed to reach where we are today. Who knows? Maybe deep learning hasnā€™t been working well on tabular data because we have been missing out on something crucial.

1 Like

Exactly. As I have mentioned in another thread, this course is not only still relevant, but it should be updated or extended to include LightGBM and XGBoost, both of which are really producing world class results in tabular, somehow (and this is what I would like to see explained from any further courses.)
Besides, this foundation in Machine Learning is really extremely helpful when trying to apply to Deep Learning. Also, the kind of data exploration he does on Bulldozers dataset using Random Forest and the kind of insights he finds are exactly what the average corporate customer requires. If you donā€™t understand the data in this way, youā€™re not likely to build the best models.

1 Like

So DL on tabular data is at least for now, useless?
To my intuition, if there is some case where the DL on tabular data has an advantage that the decision tree based ml does not have, then it may be when Itā€™s not possible to have all range of the values in the data(e.g. time-related value). In this case, is DL on tabular data useful? Or are other models such as LightGBM better than DL on tabular data for any problems?

Iā€™m having decent results on some research Iā€™m conducting so definitely donā€™t rule it out (just application based)

1 Like

Source

Iā€™ll leave you to that, although RFs are mostly competitive with NNs in tabular data. In fact, Iā€™ll almost always start with RF when tackling tabular data since itā€™s so quick to train and this can also serve as a sanity check to see if my preprocessing was decent before wasting a lot of time training a NN.

2 Likes

DL on tabular is not entirely useless. Itā€™s a bit hard to train as there is very little research done on using deep NN on tabular (except for the feature embedding one I think) but several hard-core Kaggle grandmasters use it as part of their solutions to win competitions (TalkingData, Petfinders ā€¦)

By the way there is a gradient boosting tutorial by Jeremy and Terrence Parr (they work together in USF and are writing a ML book) which covers the GBM nicely: http://explained.ai/gradient-boosting/index.html

5 Likes

This is the one case I know of where DL did well on Kaggle:

2 Likes

The Porto Seguro example pops up a lot when people want to demonstrate the ability of DL to model tabular data. Taking a closer look here shows that actually the DL approach was getting a very slight improvement over lightGBM while using much more resources. I agree this can be the difference between #1 and #10 place on Kaggle, but in any other real life problem this wonā€™t be a significant difference while the computational difference is significant.

LightGBM is fast, and very easy to develop. I actually interpret the results of the winning solution of Porto as an example in favor of the power of gradient boosting!
Having said that, Iā€™m sure DL has its own niche in tabular data. One thing that really stands out is the ability to create embeddings of categorical data as we have in fast.aiā€™s tabular learner, and as shown in the DL course in the lesson about the Rossman competition.

5 Likes

Can you explain more about what niche DL on tabular has? If it is not practical to do tabular data with DL, then what meaning has it to create embedding of categorical values in DL? Adding the embedding to the table and train the tabular data on a tree-based model, Splitting one value to several values?

1 Like

Hi,
ā€œNeural Networksā€ and ā€œTraditional Machine Learningā€ are not necessarily in a ā€œeither orā€ situation.
it is not like ā€œtake one of them and leave the other one foreverā€

Make note that NN triumphs others over the last few years, and it doesnā€™t seem like this trend has come to an end.
But there are still legitimate use cases for traditional ML

  • A baseline (just for the sake of a baseline or for evaluating validation set or both)
  • You just want a faster and simpler model, and traditional ML provides you a certain accuracy you can live with. For whatever reason, maybe you donā€™t even wanna check if NN performs better or not, cause you met your requirements and you are done.
  • Jeremy explains a use case which involves both NN and traditional ML (involving embeddings) working together effectively DL Part 1 2019 Lesson 5. (Video starts at the relevant timestamp).

For some problems, traditional ML can give better results. afaik mostly tabular. (donā€™t quote me on that though)
When it comes to NN related tabular data, these come to my mind that you can look into:

  • embeddings for categorical variables,
  • autoencoders for feature extraction,
  • RNNs for time series

With tradition ML, I mean linear models and tree-based algorithms

hope this helps
selƧuk

4 Likes

Iā€™ve spent some time trying to outperform GBM with NN on tabular tasks and so far I have not succeeded in a general consistent way. As @selcuk mentioned too, I think there are some benefits to NNs that can be exploited for tabular data but I canā€™t give a direct recipe for a general scheme. I can, however, give some examples of things I have tried and worked:

  1. A convolution is a fantastic way to learn and represent spatial (as in images) or temporal (as in time series) relations. A decision tree canā€™t do this efficiently as we all know from the image domain. So for these domains I believe that a NN can excel over GBMs.

  2. NNs are probably better for extrapolation, and data generation problems, as NNs construct a continuous, ā€œfunctionalā€ model (also, regularization is key here). As far as I understand, when you ask a decision tree for a prediction outside of its training data domain, it will only guess the nearest extreme value within the data. For generation of new data, using Bayesian frameworks for NNs can allow them to become powerful generators and may be used later to augment tabular data and get an edge over traditional problems.

  3. Autoencoders are one amazing niche of DL machines that can be used to reduce dimensionality of datasets, find hidden relations and interactions among the data, and cluster it into significant groups in an unsupervised manner.

  4. Transfer learning. This technique is amazing for image recognition, and recently for NLP too. It is definitely more complicated with tabular but none the less I think a NN could develop some helpful long range understanding (or bad bias, depending on your point of viewā€¦) of typical or recurring tabular fields such as day of week, country, zip codes, etc.

Iā€™m sorry I canā€™t give now more concrete examples/links, but hopefully in the near future I (or other people) will be able toā€¦

5 Likes

I too have tried for several years to make DL work with tabular data and I too have been unsuccessful. Michael Jahrerā€™s Porto Seguro solution seemed like a beacon of hope, but all my Denoising Auto Encoders have not really been competitive with GBMs.

Perhaps the solution is out there, but I still havenā€™t stumbled upon it.

Also the promise of entity embedding as features derived from a DL model and used in other models has not really been fulfilled in my experience. Iā€™ve tried to represent categorical variables by learnt representations from DL models and have found them mostly useless as features for GBMs. I also think they have limited or no value for linear models (GLMs, logistic regression, etc.) because itā€™s highly unlikely that they have a linear relationship with the response variable or even a relationship with the can be captured with a polynomial of sufficient order.

Just my $0.02.

2 Likes

A bit late to this question but hope to give a useful answer for future readers.

Frequently questions come with assumptions and in my view there are a few wrong ones here.

  • Fast.ai courses are always more than the tools they show. Jeremyā€™s and Rachelā€™s insights are good to obtain a deeper understanding of concepts.The mindset of ā€œI already read a book on thatā€ is usually a mistake but even more with Fastai.courses.

  • About tools and Machine Learning, another mindset, the ā€œone recipe for any problemā€ is a mistake. There are more powerful models than others, yes, but always depending on task/dataset. Key is proper understanding of a given problem. Nothing will beat linear regression if you have to fit a line.

  • About Random Forest vs. Deep Learning, at least today in 2019 the answer is Decision Tree based ensembles reign for tabular data, both in real business as in data science competitions. It is not by accident that Jeremy chose RF for an introduction to Machine Learning. Boosting is usually more powerful than RF but again not always, and depends on data (I know that, got 2 Kaggle golds this year with Random Forest single models).

So if the tool is the only concern then answer is yes, RF is fully relevant today. But I recommend checking those general assumptions above to widen your learning horizon in ML.

4 Likes

ā€¦ and inverse version of first question has again the wrong assumption of ā€œone recipe to rule them allā€ (see my above comments on assumptions).

But, more concretely, to realize when DL can outperform DT ensembles it is necessary to understand the different way both models behave and the data you have. Some questions can be:

  • What kind of feature interactions are there, and how important?
  • Are features categorical? Continuous?
  • Is co-ocurrence among categorical features sparse?
  • Is augmentation viable?
  • How noisy is data?
  • Is prediction outside seen ranges to expect?

And more. Then, as always, experimenting will give a more clear answer about best approach for the problem.

4 Likes