There are certain risks and challenges with “synthesizing new tabular data” using any technique:
- Image transforms are easy to understand and “interpret” structured data transforms are usually not
- In case of time series, some trend insights might be lost via GANs or statistical transforms
For instance, let’s take e-commerce sales time series data for any particular SKU. This item might be subject to category influences (most babies are born in certain months, most electronics are bought on holidays) as well as competition pricing, distribution pains, delivery, change in user perceived quality among other things. The synthesized data might not capture these “trend” insights.
Handling imbalanced class distribution:
GANs for Data Augmention of Structured Data:
- This blogpost explains different types of GAN being used for structured (categorical?) data augmentation.
e-Commerce-Conditional GAN from Amazon Machine Learning (ICLR 2018) is a great read on applying GANs to any specific challenge or data