NODE for Deep Learning on Tabular Data

Hi –

I’m wondering whether anyone here has played around with the NODE algorithm from Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data & this repo. It seems like the kind of thing that @jeremy might be particularly keen on.

I ran their examples and reproduced the results from the paper, but running on my own (admittedly small) datasets, I’m not getting anywhere near a basic random forest. Also, even on a GPU, NODE seems to be painfully slow. I suspect you might be able to speed things up considerably using high / one-cycle learning rates, but haven’t had a chance to play yet.

~ Ben

I tried NODE. Didn’t get too good of results. And was very buggy trying to use it with anything other than adults so I stopped working with it. Couldn’t match a tabular_learner with a NODE

What do you mean buggy? Hard to get to converge or actual bugs?

A tabular_learner is just an MLP + embedding layers for categorical variables, correct?

I tried taking advantage of the tabular embeddings so I replaced the main bit of the model with the NODE layer. I was able to get it working on the ADULTS model but I could not with Rossmann. Actual bugs.I haven’t had the time to work through them yet. Try it out, if you get it working please share it :slight_smile: I’ll share my (very) messy notebook for NODE I started and perhaps you can see something I missed

2 Likes

@muellerzr, finally, did you get NODE working properly ?

@nestorDemeure “properly” is not exactly what I would call it. I’m revisiting it here in ~2-3 weeks or so when I can have time to focus on it for the course.

1 Like

@nestorDemeure what I can say is that it’s painfully slow to train and has a TON of parameters, so I wasn’t even thinking of teaching it at all. For an example comparison:

fastai tabular learner: 530k parameters
TabNet-L (The largest) variation: 1.75M paramters
NODE with two ODST blocks (the bare minimum recommended with a depth of 6 and tree_dim of 3): 19,810,232

Average epoch time on a P100 is ~1 minute per epoch. The other two were considerably slower (this is on the ADULTs dataset). And if we attempt a bigger batch size (like say 4096, which should be doable for most any tabular models) I get a Cuda OOM

Those were the main drawbacks I saw

2 Likes

Thank you for the reply, it make sense that in those conditions one would not want to use it.

I guess that, if accuracy matter, you currently recommend using the fastai architecture ?

I haven’t found a NN architecture yet that can top it :wink:

1 Like

That being said though, I have not experimented with DeepGBM yet: https://github.com/motefly/DeepGBM