Deep Learning on Tabular Data with huge imbalance

Hello !

I’d like to try my hand at Deep Learning for Tabular Data. Except my two classes are hugely imbalanced.

Like one class is 99% of the dataset, and the other one is 1%…

So I tried to run the classroom notebook with the ADULTS dataset, and change the metric (because accuracy is not a good idea for imbalance: it will just learn to always predict the 99% class).

I tried to change accuracy to fbeta, and got the following error:
RuntimeError: The size of tensor a (2) must match the size of tensor b (64) at non-singleton dimension 1

Two questions:

-> should I use fbeta as a metric, or use another approach ?
-> if I should use fbeta, what’s wrong with the above code ?

Thank you very much and have a nice day !

1 Like

Hey, I faced similar issue. Not sure what the problem with fbeta function, but you can use the FBeta class to instantiate the metric instead. I.e metrics=FBeta(beta=1) to get F1 score.

2 Likes

Hi Benjamin, I can help you if ever you need

1 Like

Yes, I do !

Try changing your loss function and using weights (as described in Pytorch documentation).

Let me know if that works