I’m wondering whether anyone here has played around with the NODE algorithm from Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data & this repo. It seems like the kind of thing that @jeremy might be particularly keen on.
I ran their examples and reproduced the results from the paper, but running on my own (admittedly small) datasets, I’m not getting anywhere near a basic random forest. Also, even on a GPU, NODE seems to be painfully slow. I suspect you might be able to speed things up considerably using high / one-cycle learning rates, but haven’t had a chance to play yet.
I tried NODE. Didn’t get too good of results. And was very buggy trying to use it with anything other than adults so I stopped working with it. Couldn’t match a tabular_learner with a NODE
What do you mean buggy? Hard to get to converge or actual bugs?
tabular_learner is just an MLP + embedding layers for categorical variables, correct?
I tried taking advantage of the tabular embeddings so I replaced the main bit of the model with the NODE layer. I was able to get it working on the ADULTS model but I could not with Rossmann. Actual bugs.I haven’t had the time to work through them yet. Try it out, if you get it working please share it I’ll share my (very) messy notebook for NODE I started and perhaps you can see something I missed
@muellerzr, finally, did you get NODE working properly ?
@nestorDemeure “properly” is not exactly what I would call it. I’m revisiting it here in ~2-3 weeks or so when I can have time to focus on it for the course.
@nestorDemeure what I can say is that it’s painfully slow to train and has a TON of parameters, so I wasn’t even thinking of teaching it at all. For an example comparison:
fastai tabular learner: 530k parameters
TabNet-L (The largest) variation: 1.75M paramters
NODE with two
ODST blocks (the bare minimum recommended with a depth of 6 and tree_dim of 3): 19,810,232
Average epoch time on a P100 is ~1 minute per epoch. The other two were considerably slower (this is on the ADULTs dataset). And if we attempt a bigger batch size (like say 4096, which should be doable for most any tabular models) I get a Cuda OOM
Those were the main drawbacks I saw
Thank you for the reply, it make sense that in those conditions one would not want to use it.
I guess that, if accuracy matter, you currently recommend using the fastai architecture ?
I haven’t found a NN architecture yet that can top it
That being said though, I have not experimented with DeepGBM yet: https://github.com/motefly/DeepGBM