Unbalanced Structured Data

Celebi · July 25, 2018, 9:07pm

Hello.
I was wondering how you guys would approach a deep learning problem with unbalanced structured data. I am aware that this problem is often circumvented in image analysis through data augmentation, but how would you apply that to structured data?
Thanks.

lokeshdangi · July 26, 2018, 2:15am

You can try oversampling the class the less data points or undersampling the class with higher data points.

knesgood · July 27, 2018, 9:00pm

@Celebi - if you look around the forums, you’ll find a few threads on this. It looks like the prevailing answer is to over-sample the under represented class.

pacocoli · August 2, 2018, 8:16pm

When you oversample or ‘resample’ you introduce a sampling error. Make sure to initiate your random number generator with a range of numbers so you can calculate the ‘resampling’ error by comparing the train/test splits for each resampled set. I find deep learning is more tolerant to class imbalance than is usually proclaimed as long as the test sample uses the same class imbalances. If the problem is finding fraud then I would look to a cascade of fits segmenting the data set with known obvious ‘not fraud’. Try working that really small class population backwards by adding classifications, i.e., likely to move up in credit, likely to move down, likely to commit fraud, etc.