On a bit of a side note, one thing I’d like to do is take all their baselines and run it through fastai’s tabular model for a fair comparison too have you tried this yet for the datasets they looked at? I mention this as ADULTs is a particularly challenging dataset that doesn’t do much (or can have much improvement) based on the recent studies. (The highest I’ve seen was GBT with 89% I think?)
Mostly what I’ve been looking at is A: accuracy, B: overall training time, and C: Model complexity. As (usually) the nice thing with tabular models they’re fast at training and at inference. Let me know your thoughts
Yes, the interpretability is another bit. I’d certainly be interested in seeing this, as it was a bit hard to wrap my head around when I read the paper initially (it’s been a little bit, been focused on vision for the last few months). Could you try to basically sum up what they described? (I’ll also revisit the paper here once all the vision notebooks are done)
@grankinI believe that something may be wrong. Nope, just they train for a very long time.
learning rate of 0.02 (decayed 0.9 every 10k iterations with an exponential decay) for 71k iterations.
Sweet holy moley. 71,000 batches?! Let me try again…
By the way, if my math is right, that is the equivalent of 11,000 epochs.
(They have a 4096 batch size, 4096 * 71,000 iterations / 25000 rows)
Running those now
Surpassed fastai at ~1600 epochs
Just had a quick thought too, let me see…
Well, it won’t let me post so, let me post an exciting update here:
Alright so, half way through training I got a harebrained idea. This arch doesn’t change much, right? Can I use transfer learning here. I did a quick experiment. 50 epochs with the poker dataset, then 50 epochs transfered onto Adults. Here is what I found.
To reach above 80% accuracy:
Non-Transfer learning: epoch 31
Transfer learning: epoch 11
There is certainly promise here.
Next is to try the reverse
Reverse didn’t make much of a difference, unsure why that’s the case but it is!
So I got the code for freezing and whatnot figured out. I’m training the poker model tonight then go from there tommorow morning. (Or this morning? I think it’s like 3:30 am)
Hmmmm… I could not recreate their accuracy they achieved for Poker Hand…
@grankin even running in their source code I was unable to match it (PyTorch), I got early stopping ~epoch 207, which had an accuracy of 54%. (I still wasn’t able to achieve this on our version but that is a far cry from 99%)
I think it’s a great idea to test on different datasets different algorithms, I’m eager to know about the results!
My only concern with your fastai wrapper is that it does not use our repo but a copy/paste of it, so for example the bugfix we released this week is not taken into account. Is there a way to make a fastai wrapper that uses directly our repo? Moreover, our implementation is scikit compatible so I am not sure about the need for a fastai wrapper but that’s another story By the way if you see some improvements that could be done with the code, please let us know! Feel free to contribute as well!
I will probably do some experiments myself with pytorch-tabnet I’ll let you know if I can share some results!
A bit off topic but I just wanted to say thank you for porting the library to PyTorch. I have made some kind of simple CLI on top of the original TensorFlow implementation to run benchmarks and hyperparameter tuning on different datasets. The PyTorch version with sklearn interface will make this more straightforward. I’ll also be happy to report results to compare with other approaches including standard FastAI and FastAI TabNet. Nice work!
@Optimo I’m not sure this is possible as fastai’s dataloaders make our inputs differently than what it normally is (I believe), and do our input is 2 different Tensors (one with categorical and one with numerical). You can see this in the forward() method, as it’s very different. Also thank you for porting it over for us (also calling it a copy/paste kinda kills the effort in which @grankin went into doing it, as I’m sure it’s a bit more than that, else it would just be the direct thing)
@Optimo or @mikaelh you may be more familiar with this, do you know any other datasets that the literature uses for comparisons? There’s not a lot of literature with using fastai’s technique as a comparison (hence this thread) and I was wondering if there was more I missed (besides the salary dataset)
I know there have been some mega-benchmarks with hundreds of datasets (for example one paper that concluded that random forests are the best classification algorithm; it’s a couple of years old though). The SeLU paper reports results on 121 UCI datasets.