As I’m wanting to try out various new tabular models, I keep drawing to the same 2 datasets we use without much variation (ADULTs and Rossmann), so I’ve constructed a baseline (of sorts) in which fastai can be compared with. The repository is here including a notebook showing how I achieved these baselines here. The baselines themselves were taking from the TabNet paper, which was published in September of 2019. The goal is to present three things: the model, it’s accuracy, and the number of total parameters in the model. Also, as a request, if you do work with these try to post your feature importance as well, as there could be very interesting developments to where everyone’s model leans
Challenge: Successfully identify the rank of the current hand based on suit (categorical) and rank (numerical) for the five cards in your hand (total of 10 variables).
Challenge: Using simulated data with features characterizing events detected by ATLAS, classify events into “tau tau decay of a Higgs boson” versus “background.”
Honestly I’m unsure as to what it could be. Poker hand is a particularly hard problem, hence why most models got <70%. In terms of performance I believe the Sarcos Robotics Arm Inverse Dynamics dataset was a bit more eye opening What surprises me about that one is there was no categorical features, something I would expect for the model to perform well
That’s a very interesting question.
Before I really tried Poker Dataset I thought that NN can easily handle this deterministic case. Oh boy, how wrong was I…
My best result so far is 73% (fastai v1), nowhere near to the 99-ish result
My mind is split in half between: ‘it should be easy’ and ‘99% doesn’t look real without heavy feature engineering’
Don’t worry. You’re not not going crazy. No one else have been able to match it, hence why I brought it down. (Even the TF implementation). Could you share your 73% one? That’s fantastic!
I’ll try to reproduce it tomorrow as it’s 1 am here now.
I’ve saved the learner (of 0.73), but I have some doubts on reproducibility as after an our of experimentation it just pushed 65-ish threshold and moved forward
It’s still most of the times feels like Alchemy for me, rather than directional search
I’ve achieved 91.7% accuracy with fastai v1
Here is the notebook
No feature engineering was used except the fact that all the columns I turned into categorical features.
I’ve acieved it by switching off regularization completely. Model did overfit heavily, but that didn’t matter for me as long as validation accuracy went up.
To check everything I’ve tested model with test data. I labeled it automatically as detecting hand from the cards can be done algorithmically (I just did not label staright and royal flash in test data, but it’s a very tiny percent of all the cases, should be insignificant in terms of accuracy).
Test dataset (100k rows) confirmed 91% accuracy
Here is the link to the code I mentioned earlier, @muellerzr!
I could get 99,48% with fastai2 for poker rule induction dataset (go to cell # 130 of the notebook). It was kind of a pain (lots of epochs), but I suspect it could get even better because the validation loss was still decreasing. I dont know if we could speed the training with some trick.
I managed to get 99,10% with TabNet, and also suspect it could also get better with more epochs.
Great work @fmobrj75! I’m going to quickly run it now (with possibly some modifications) and then I’ll give you some thoughts of why that could be working
@fmobrj75 if I had to guess, we’re able to extract more information out of having it as both a numerical (which it does suit for this) and as a categorical (inwhich it also suits). I’m running a quick comparision/reproducibility test and I’ll update with the results but here’s what i’m doing:
500 epochs total with a LR of 1e-1:
Adam + One Cycle
Ranger + Flat Cos
Mish + Adam
Mish + Ranger
Also each are run for 5 so their averages will be reported once done (this is important I think)
@fmobrj75 I wrote a callback to help keep track of our best accuracies. What I can first tell you right now is that I achieve on Adam ~99.46/8 around epoch 320/330 so not nearly as bad as the 5/600 you were doing
I decided to keep it on just Adam, had a few issues with the learning rate on on ranger (note I was fitting for 400 epochs straight total). I did get a very tight grouping though:
Overall:
Accuracy Mean/Std: 99.44/0.029%
Num Epochs Mean/Std: 299.4/27.68.
Total Computation Time to get there: 1 minute 36 seconds
Great work @fmobrj75
And I’m starting to think what it tells us about TabNet (and maybe other specific architectures). Does this mean that blunt Fully Connected layers-net are still better (not worse) than new fancy architectures?
@Pak to add a bit more than that. These fancy new architectures also aren’t as efficient. TabNet required 2x training time per epoch for even more epochs than our architecture took, and had 2-16x the total parameters. So I’m not sold on it yet. There are obviously more to try, like DeepGBM but I feel like these simple fully connected models do the job more than just enough.
TabNet was different because we could explain our models ‘better’, but is this not what FI and dependency plots give us? What is the weakness right now? (Would love your thoughts on that @Pak)
More on the FI, if we wanted to see attention, we simply choose to look at different FI based on the respective ‘y’’s from a labeled dataset, no?
To be honest I just couldn’t get right throught what is the meaning of interpretability TabNet provide.
As I understood they tell us that they can determine feature importance. Ok, but it also can be achieved with different techniques.
Correct me if I’m wrong, they also provide what features are “fired up” at each step (and I think for each examlpe). This in theory could be a usefull tool for understanding how it take it decisions.
But to be honest for now I did not really understand how to ‘read’ their layer-feature-interpretation pictures
From what I saw, the more “lights” a feature provides the more it was used. Which as we know, can also be done with feature importance. So, you’re not, that’s the same that I got. Just now people have a picture I think rather than a computation?