I’ve played around with a few, but the one I’m settling on for now is MSE + MCE (Mean Cross Entropy), which is at least consistent across models of varying sizes.
I tried some other metrics based on balancing the error between the continuous and categorical elements, but it was hard to interpret the loss.
So far I’m able to train the model to a validation loss of 0.51 for normally distributed continuous variables with a stdev of 1, which I think is good, but i’ll need to compare the outputs.
It’s still underfitting slightly, which may be the result of using both swap column data augmentation and dropout, so I’ll have to explore lowering the dropout. Eventually I need to have an ablation study.
The other thing I’d like to do is compare it to the original VAE which outputs 1-hot encodings and uses MSE for the entire output vector. I think to do so I just need to modify the loss such that the categoricals are output in that form. I’m curious to see if explicit category embeddings and cross entropy loss help make a better fitting model.