NODE for Deep Learning on Tabular Data

bkj · September 26, 2019, 12:45am

Hi –

I’m wondering whether anyone here has played around with the NODE algorithm from Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data & this repo. It seems like the kind of thing that @jeremy might be particularly keen on.

I ran their examples and reproduced the results from the paper, but running on my own (admittedly small) datasets, I’m not getting anywhere near a basic random forest. Also, even on a GPU, NODE seems to be painfully slow. I suspect you might be able to speed things up considerably using high / one-cycle learning rates, but haven’t had a chance to play yet.

~ Ben

muellerzr · September 26, 2019, 1:09am

I tried NODE. Didn’t get too good of results. And was very buggy trying to use it with anything other than adults so I stopped working with it. Couldn’t match a tabular_learner with a NODE

bkj · September 26, 2019, 2:08am

What do you mean buggy? Hard to get to converge or actual bugs?

A tabular_learner is just an MLP + embedding layers for categorical variables, correct?

muellerzr · September 26, 2019, 2:10am

I tried taking advantage of the tabular embeddings so I replaced the main bit of the model with the NODE layer. I was able to get it working on the ADULTS model but I could not with Rossmann. Actual bugs.I haven’t had the time to work through them yet. Try it out, if you get it working please share it I’ll share my (very) messy notebook for NODE I started and perhaps you can see something I missed

nestorDemeure · February 11, 2020, 6:30pm

@muellerzr, finally, did you get NODE working properly ?

muellerzr · February 11, 2020, 6:33pm

@nestorDemeure “properly” is not exactly what I would call it. I’m revisiting it here in ~2-3 weeks or so when I can have time to focus on it for the course.

muellerzr · February 11, 2020, 7:05pm

@nestorDemeure what I can say is that it’s painfully slow to train and has a TON of parameters, so I wasn’t even thinking of teaching it at all. For an example comparison:

fastai tabular learner: 530k parameters
TabNet-L (The largest) variation: 1.75M paramters
NODE with two ODST blocks (the bare minimum recommended with a depth of 6 and tree_dim of 3): 19,810,232

Average epoch time on a P100 is ~1 minute per epoch. The other two were considerably slower (this is on the ADULTs dataset). And if we attempt a bigger batch size (like say 4096, which should be doable for most any tabular models) I get a Cuda OOM

Those were the main drawbacks I saw

nestorDemeure · February 11, 2020, 7:20pm

Thank you for the reply, it make sense that in those conditions one would not want to use it.

I guess that, if accuracy matter, you currently recommend using the fastai architecture ?

muellerzr · February 11, 2020, 7:20pm

I haven’t found a NN architecture yet that can top it

muellerzr · February 11, 2020, 7:22pm

That being said though, I have not experimented with DeepGBM yet: https://github.com/motefly/DeepGBM