Don’t worry. You’re not not going crazy. No one else have been able to match it, hence why I brought it down. (Even the TF implementation). Could you share your 73% one? That’s fantastic!
I’ll try to reproduce it tomorrow as it’s 1 am here now.
I’ve saved the learner (of 0.73), but I have some doubts on reproducibility as after an our of experimentation it just pushed 65-ish threshold and moved forward
It’s still most of the times feels like Alchemy for me, rather than directional search
I still think it is, with FE as us adding in bits to the pot
Also @Pak for all their baselines, zero FE was used
99% is stopping to feel so impossible now.
I’ve achieved 91.7% accuracy with fastai v1
Here is the notebook
No feature engineering was used except the fact that all the columns I turned into categorical features.
I’ve acieved it by switching off regularization completely. Model did overfit heavily, but that didn’t matter for me as long as validation accuracy went up.
To check everything I’ve tested model with test data. I labeled it automatically as detecting hand from the cards can be done algorithmically (I just did not label staright and royal flash in test data, but it’s a very tiny percent of all the cases, should be insignificant in terms of accuracy).
Test dataset (100k rows) confirmed 91% accuracy
Awesome job @Pak
Here is the link to the code I mentioned earlier, @muellerzr!
I could get 99,48% with fastai2 for poker rule induction dataset (go to cell # 130 of the notebook). It was kind of a pain (lots of epochs), but I suspect it could get even better because the validation loss was still decreasing. I dont know if we could speed the training with some trick.
I managed to get 99,10% with TabNet, and also suspect it could also get better with more epochs.
Great work @fmobrj75! I’m going to quickly run it now (with possibly some modifications) and then I’ll give you some thoughts of why that could be working
@fmobrj75 if I had to guess, we’re able to extract more information out of having it as both a numerical (which it does suit for this) and as a categorical (inwhich it also suits). I’m running a quick comparision/reproducibility test and I’ll update with the results but here’s what i’m doing:
500 epochs total with a LR of 1e-1:
Adam + One Cycle
Ranger + Flat Cos
Mish + Adam
Mish + Ranger
Also each are run for 5 so their averages will be reported once done (this is important I think)
@fmobrj75 I wrote a callback to help keep track of our best accuracies. What I can first tell you right now is that I achieve on Adam ~99.46/8 around epoch 320/330 so not nearly as bad as the 5/600 you were doing
I decided to keep it on just Adam, had a few issues with the learning rate on on ranger (note I was fitting for 400 epochs straight total). I did get a very tight grouping though:
Accuracy Mean/Std: 99.44/0.029%
Num Epochs Mean/Std: 299.4/27.68.
Total Computation Time to get there: 1 minute 36 seconds
Great work @fmobrj75
Here’s that notebook: https://github.com/muellerzr/fastai2-Tabular-Baselines/blob/master/Adam_Poker_99_4.ipynb
Quote Jeremy on what to call it:
“Branched covarient embeddings” AKA my categorical feature is also a numerical
“BREMCO: BRanched Embedded COvariates”
Fabio, Zachary, that’s really awesome.
And I’m starting to think what it tells us about TabNet (and maybe other specific architectures). Does this mean that blunt Fully Connected layers-net are still better (not worse) than new fancy architectures?
Quite possibly. I’ll be looking into this the next few months or so, but it seems to be that way.
@Pak to add a bit more than that. These fancy new architectures also aren’t as efficient. TabNet required 2x training time per epoch for even more epochs than our architecture took, and had 2-16x the total parameters. So I’m not sold on it yet. There are obviously more to try, like DeepGBM but I feel like these simple fully connected models do the job more than just enough.
TabNet was different because we could explain our models ‘better’, but is this not what FI and dependency plots give us? What is the weakness right now? (Would love your thoughts on that @Pak)
More on the FI, if we wanted to see attention, we simply choose to look at different FI based on the respective ‘y’’s from a labeled dataset, no?
To be honest I just couldn’t get right throught what is the meaning of interpretability TabNet provide.
As I understood they tell us that they can determine feature importance. Ok, but it also can be achieved with different techniques.
Correct me if I’m wrong, they also provide what features are “fired up” at each step (and I think for each examlpe). This in theory could be a usefull tool for understanding how it take it decisions.
But to be honest for now I did not really understand how to ‘read’ their layer-feature-interpretation pictures
From what I saw, the more “lights” a feature provides the more it was used. Which as we know, can also be done with feature importance. So, you’re not, that’s the same that I got. Just now people have a picture I think rather than a computation?
In that case it seems to me that pictures like these https://hsto.org/webt/a7/48/4w/a7484wg2rlenmf_a8kblco4lqbc.png are more easy to interpret and to make real-world decidion upon than like these https://c2n.me/45STWWh.png
Although I do aware how later picture be more helpfull in undestandind how model inself work
Maybe another lesson is that thinking about the input features (feature engineering) is still very important. Anyway this whole thread has been extremely insightful to me, kudos to everyone!
Great work Fabio and @muellerzr I’m just curious, is the accuracy reported as test accuracy is actually the validation accuracy?, because when I looked at the notebooks the “test.csv” never used. If yes, did you try submitting the predictions for “test.csv” to kaggle?
For now, just validation accuracy. Will post later to kaggle and keep you informed.