Deep Learning on Tabular Data with huge imbalance

bdubreu · February 28, 2019, 9:46am

Hello !

I’d like to try my hand at Deep Learning for Tabular Data. Except my two classes are hugely imbalanced.

Like one class is 99% of the dataset, and the other one is 1%…

So I tried to run the classroom notebook with the ADULTS dataset, and change the metric (because accuracy is not a good idea for imbalance: it will just learn to always predict the 99% class).

I tried to change accuracy to fbeta, and got the following error:
RuntimeError: The size of tensor a (2) must match the size of tensor b (64) at non-singleton dimension 1

Two questions:

-> should I use fbeta as a metric, or use another approach ?
-> if I should use fbeta, what’s wrong with the above code ?

Thank you very much and have a nice day !

stvad · March 15, 2019, 11:29pm

Hey, I faced similar issue. Not sure what the problem with fbeta function, but you can use the FBeta class to instantiate the metric instead. I.e metrics=FBeta(beta=1) to get F1 score.

jeremyeast · March 17, 2019, 4:22am

Hi Benjamin, I can help you if ever you need

bdubreu · March 19, 2019, 7:58am

Yes, I do !

jeremyeast · April 6, 2019, 11:09pm

Try changing your loss function and using weights (as described in Pytorch documentation).

Let me know if that works