Data augmentation for tabular data

stella · May 3, 2020, 4:46pm

What are some of the effective data augmentation techniques for non-image data?
Trying to do semi-supervised learning on non-image data with MixMatch and FixMatch like techniques, which all require multiple ways to do data augmentation. For tabular data which columns are not meaningful (like embeddings) is there any other way to augment other than adding gaussian noise?

Thanks!

jeremyeast · May 3, 2020, 8:02pm

there are several I have used with tabular models in fast.ai, check out the smote library.

harikrishnanrajeev · October 27, 2020, 2:32am

@stella , thank you for posting this topic.

I have tried swap noise , mixmatch , but found that they were not helping much. Again i think it all depends on the data that you are applying it on.

Finally i ended up adding gaussian noise. Really keen too hear from others, what worked and what did not work.

delrosario · October 27, 2020, 2:51am

@harikrishnanrajeev how much adding gaussian noise improved your metric with respect to a non augmented dataset?

harikrishnanrajeev · October 27, 2020, 10:33am

with respect to non augmented dataset, gaussian noise augmentation helped add 3 to 3.5 points in accuracy