Maybe another lesson is that thinking about the input features (feature engineering) is still very important. Anyway this whole thread has been extremely insightful to me, kudos to everyone!
Great work Fabio and @muellerzr I’m just curious, is the accuracy reported as test accuracy is actually the validation accuracy?, because when I looked at the notebooks the “test.csv” never used. If yes, did you try submitting the predictions for “test.csv” to kaggle?
There is also a test file available with a million test examples. I have evaluated on that and got about the same accuracy as for my validation set. I only had about 86% though, not 99…
While the tabNet visualization is quite unreadable, the fact that it uses attention can be used to produced a bar plot saying, for a given prediction, how important where each feature (which is not the usual meaning of feature importance).
This can already be done with tools such as Shap but it is nice to have it baked into the model and easily accessible.
That is, to me, the main appeal of tabNet : I can train a model and then produce predictions with a plot explaining which feature lead to that decision and have it reviewed by a domain expert.
So we should look at using Shap with fastai and provide examples for people possibly. While it’s not in the model itself, it’s still available ideally in a line of code.
I believe most people will go to fastNet for the promised performance boost (here benchmarks helps a lot) but yes, easy integration with Shap (or something similar) would make fastai’s model strictly better in my eyes (overall I feel that fastai is still lacking in the interpretation domain).
(by the way, if someone implements it, do not hesitate to send a PR or message to integrate it in the interpretation section of my fastai extensions repository list)
I will start it once I finish porting manifold mixup to V2 (so sunday I think).
My aim is to put the code in a .py file (that people can just dump in their projects), refactor it to improve the API if possible, add some doc, a readme with explainations/links and a demo notebook to illustrate it.
Look into nbdev! (if you haven’t already). If I can’t port my implementation over (for whatever reason) into the fastai2 library directly, I’ll be doing the same so people can just pip install fastaishap
This can already be done with tools such as Shap but it is nice to have it baked into the model and easily accessible.
So we should look at using Shap with fastai and provide examples for people possibly. While it’s not in the model itself, it’s still available ideally in a line of code.
I would like to point one relevant aspect: required assumptions.
Simply speaking because of the causality problem some of the Explainers in SHAP might give incorrect results if the independence assumption is not fulfilled. In tabular data unfortunately, features (more often than not) are not independent.
The documentation itself mentions it in some places, but it doesn’t state clearly which Explainers are affected and which are not. For instance TreeExplainer seems to be ok if
but it’s not useful for NN. SamplingExplainer requires that assumption always. KernelExplainer seems to me affected too, because again the variables are ‘set’:
DeepExplainer in the original paper (https://arxiv.org/abs/1705.07874) seems to be under the same assumption. GradientExplainer might be a good candidate, but I have not read the referred paper.
Great analysis @hubert.misztela! I wasn’t simply saying we should be “done” there as those are certainly problems. Would those possibly be answered through dependency plots as well? As now we can see the “whole board”. IIRC I saw that they are supported.
The “missing” is the same as permutation importance, just no value at all (something we’ve used extensively). It’s through this that we get to see what’s being affected and from what you say they’re doing something extremely similar. (Or am I completely off the ball here?)
I’ll also note I just stared playing with this tool, I have much to learn (and fantastic explanation and cover!!!)