I ran their examples and reproduced the results from the paper, but running on my own (admittedly small) datasets, I’m not getting anywhere near a basic random forest. Also, even on a GPU, NODE seems to be painfully slow. I suspect you might be able to speed things up considerably using high / one-cycle learning rates, but haven’t had a chance to play yet.
I tried NODE. Didn’t get too good of results. And was very buggy trying to use it with anything other than adults so I stopped working with it. Couldn’t match a tabular_learner with a NODE
I tried taking advantage of the tabular embeddings so I replaced the main bit of the model with the NODE layer. I was able to get it working on the ADULTS model but I could not with Rossmann. Actual bugs.I haven’t had the time to work through them yet. Try it out, if you get it working please share it I’ll share my (very) messy notebook for NODE I started and perhaps you can see something I missed
@nestorDemeure “properly” is not exactly what I would call it. I’m revisiting it here in ~2-3 weeks or so when I can have time to focus on it for the course.
@nestorDemeure what I can say is that it’s painfully slow to train and has a TON of parameters, so I wasn’t even thinking of teaching it at all. For an example comparison:
fastai tabular learner: 530k parameters TabNet-L (The largest) variation: 1.75M paramters NODE with two ODST blocks (the bare minimum recommended with a depth of 6 and tree_dim of 3): 19,810,232
Average epoch time on a P100 is ~1 minute per epoch. The other two were considerably slower (this is on the ADULTs dataset). And if we attempt a bigger batch size (like say 4096, which should be doable for most any tabular models) I get a Cuda OOM