Deep Learning on Tabular Data with huge imbalance



I’d like to try my hand at Deep Learning for Tabular Data. Except my two classes are hugely imbalanced.

Like one class is 99% of the dataset, and the other one is 1%…

So I tried to run the classroom notebook with the ADULTS dataset, and change the metric (because accuracy is not a good idea for imbalance: it will just learn to always predict the 99% class).

I tried to change accuracy to fbeta, and got the following error:
RuntimeError: The size of tensor a (2) must match the size of tensor b (64) at non-singleton dimension 1

Two questions:

-> should I use fbeta as a metric, or use another approach ?
-> if I should use fbeta, what’s wrong with the above code ?

Thank you very much

Hey, I faced similar issue. Not sure what the problem with fbeta function, but you can use the FBeta class to instantiate the metric instead. I.e metrics=FBeta(beta=1) to get F1 score.


Hi Benjamin, I can help you if ever you need

Yes, I do !

Try changing your loss function and using weights (as described in Pytorch documentation).

Let me know if that works