Some Baselines for other Tabular Datasets with fastai2

For now, just validation accuracy. Will post later to kaggle and keep you informed.

2 Likes

There is also a test file available with a million test examples. I have evaluated on that and got about the same accuracy as for my validation set. I only had about 86% though, not 99…

2 Likes

I mean with labels, here: https://archive.ics.uci.edu/ml/datasets/Poker+Hand

1 Like

While the tabNet visualization is quite unreadable, the fact that it uses attention can be used to produced a bar plot saying, for a given prediction, how important where each feature (which is not the usual meaning of feature importance).

This can already be done with tools such as Shap but it is nice to have it baked into the model and easily accessible.

That is, to me, the main appeal of tabNet : I can train a model and then produce predictions with a plot explaining which feature lead to that decision and have it reviewed by a domain expert.

1 Like

Submitted to Kaggle on 1MM test records. Score: 0.99451.

6 Likes

So we should look at using Shap with fastai and provide examples for people possibly. While it’s not in the model itself, it’s still available ideally in a line of code.

I believe most people will go to fastNet for the promised performance boost (here benchmarks helps a lot) but yes, easy integration with Shap (or something similar) would make fastai’s model strictly better in my eyes (overall I feel that fastai is still lacking in the interpretation domain).

(by the way, if someone implements it, do not hesitate to send a PR or message to integrate it in the interpretation section of my fastai extensions repository list)

2 Likes

IIRC it has been done in fastaiv1, I’ll look at porting it over when I can.

Said post:

(But obviously I’ll make a notebook describing it)

2 Likes

Ping me once you have an example notebook :slight_smile:

(If you store a working draft in a dedicated repository, I can help extract the functions, document and clean up the code if you want to)

@nestorDemeure you wanted your ping :wink: https://github.com/muellerzr/fastai2-SHAP/blob/master/SHAP_fastai2.ipynb

There’s probably (more) than a few ways it can be improved and then brought into fastai2, just this is quick and dirty while I can :slight_smile:

3 Likes

Thank you, you can expect a PR this weekend :slight_smile:

(in the meantime I added the link to the fastai extensions repository)

1 Like

Great! Would love more explaination on the notebook if possible too :slight_smile: (if not I’ll get to it later, I’m getting familiar with shap as I go)

Big thanks to @JonathanR, his v1 code helped tremendously

(also grabbing cols will be easier in the next version update)

1 Like

I will start it once I finish porting manifold mixup to V2 (so sunday I think).

My aim is to put the code in a .py file (that people can just dump in their projects), refactor it to improve the API if possible, add some doc, a readme with explainations/links and a demo notebook to illustrate it.

(and maybe a V1 equivalent)

1 Like

Look into nbdev! (if you haven’t already). If I can’t port my implementation over (for whatever reason) into the fastai2 library directly, I’ll be doing the same so people can just pip install fastaishap

1 Like

This can already be done with tools such as Shap but it is nice to have it baked into the model and easily accessible.

So we should look at using Shap with fastai and provide examples for people possibly. While it’s not in the model itself, it’s still available ideally in a line of code.

I would like to point one relevant aspect: required assumptions.

SHAP uses interventional substitutions of the feature values and makes predictions based on those modified samples. It requires independence between the features if we do not want to cause domain shift. This was pointed recently in https://arxiv.org/pdf/1910.13413.pdf and earlier in https://christophm.github.io/interpretable-ml-book/shap.html.

Simply speaking because of the causality problem some of the Explainers in SHAP might give incorrect results if the independence assumption is not fulfilled. In tabular data unfortunately, features (more often than not) are not independent.
The documentation itself mentions it in some places, but it doesn’t state clearly which Explainers are affected and which are not. For instance TreeExplainer seems to be ok if

feature_perturbation=“tree_path_dependent”. https://shap.readthedocs.io/en/latest/#shap.TreeExplainer

but it’s not useful for NN.
SamplingExplainer requires that assumption always. KernelExplainer seems to me affected too, because again the variables are ‘set’:

To determine the impact of a feature, that feature is set to “missing” and the change in the model output is observed. https://shap.readthedocs.io/en/latest/#shap.KernelExplainer

DeepExplainer in the original paper (https://arxiv.org/abs/1705.07874) seems to be under the same assumption.
GradientExplainer might be a good candidate, but I have not read the referred paper.

Forgive me a long post.

2 Likes

Great analysis @hubert.misztela! I wasn’t simply saying we should be “done” there as those are certainly problems. Would those possibly be answered through dependency plots as well? As now we can see the “whole board”. IIRC I saw that they are supported.

The “missing” is the same as permutation importance, just no value at all (something we’ve used extensively). It’s through this that we get to see what’s being affected and from what you say they’re doing something extremely similar. (Or am I completely off the ball here?)

I’ll also note I just stared playing with this tool, I have much to learn :slight_smile: (and fantastic explanation and cover!!!)

And of course it’s a great thread!
We needed to show some love for tabular data :stuck_out_tongue:

2 Likes

Yes, there’s a bit of irony here, that we, humans, pack most of our data in tables (I mean that if you ask a random person what the ‘real’ data is: financial excel sheet or a picture of a dog, I think most will choose the first one) and yet it seems like we simply don’t know how to process this tabular data efficiently with NN, with any elaborated methods other than simple fully connected layers network :wink:

1 Like

And as for the problem hubert mentioned (dependent features), it seems like a very big deal in real data I’ve encountered with. In fact its sometimes just hard to find a single isolated feature (value of a feature that you can change to other value from this column without changing other columns as well). And that’s why I’m starting to think that more often than not even feature importance we should do not for a single column, but for pair (or more) of dependent columns as a whole (maybe we should make some correlation analysis first or use the domain knowledge) :frowning:

1 Like