Autoencoder for tabular data

Let’s say I have some tabular data and I want to use a denoising autoencoder to generate good features for a downstream neural network. A bit like the solution in this kaggle post: https://www.kaggle.com/c/petfinder-adoption-prediction/discussion/88740

I understand how to train an autoencoder with data that is only continuous, but what do you do if you have categorical variables mixed with continuous variables in your input data. For continuous data, you simply have to have MSE loss to compare how close each of your variables are from the input variables. But for categorical variables, do you have to use cross-entropy for each of the categorical variables and somehow blend the losses for the continuous and categorical variables?

Thanks,

2 Likes

Its a great knowledgeable post.

1 Like

I have a similar problem where I am trying to combine text and categorical variables. For categorical variables I believe you can use swap noise as a way of adding noise to the input and augmenting the data. The loss used is the reconstruction loss so the network learns how to reconstruct the input from the noise.