Using GAN for generating new tabular data


(Amrit ) #1

Hi all,

I wanted to reach out to see if any one had tried to use GANs to generate tabular data. Many establishments own tabular data structured in the form of records in a database however suffer from the issue of imbalanced data. For images you can use augmentation to generate new image data. Is there a way to generate new tabular data?

Thanks,
Amrit


(nirant) #2

Hey @amritv,

There are certain risks and challenges with “synthesizing new tabular data” using any technique:

  • Image transforms are easy to understand and “interpret” structured data transforms are usually not
  • In case of time series, some trend insights might be lost via GANs or statistical transforms

For instance, let’s take e-commerce sales time series data for any particular SKU. This item might be subject to category influences (most babies are born in certain months, most electronics are bought on holidays) as well as competition pricing, distribution pains, delivery, change in user perceived quality among other things. The synthesized data might not capture these “trend” insights.

Handling imbalanced class distribution:

GANs for Data Augmention of Structured Data:

  • This blogpost explains different types of GAN being used for structured (categorical?) data augmentation.
  • e-Commerce-Conditional GAN from Amazon Machine Learning (ICLR 2018) is a great read on applying GANs to any specific challenge or data

(Amrit ) #3

Thanks @nirantk, really appreciate your response and links especially the paper