This paper seems to have flown a bit under the radar in August. It was put out by two authors from Google Cloud and it proposes a network design based on attention for tabular data which appears to outperform tree based models AND provide interpretability.
Seems too good to be true, but worth examining.
There are tensorflow and pytorch implementations of this network out there. Given how much tabular data is out there, I’m surprised this hasn’t gotten more attention.
Yes I did, and I taught it in my course, see here:
If you’re attention focused and can’t be convinced by FI (though I argue it’s a pretty good idea if it makes sense to you), then use TabNet. It may not be as accurate as the straight fastai model and probably will take much longer to train though (in terms of epochs, but not necessarily seconds), keep that in mind.