Iāve played around with a few, but the one Iām settling on for now is MSE + MCE (Mean Cross Entropy), which is at least consistent across models of varying sizes.

I tried some other metrics based on balancing the error between the continuous and categorical elements, but it was hard to interpret the loss.

So far Iām able to train the model to a validation loss of 0.51 for normally distributed continuous variables with a stdev of 1, which I think is good, but iāll need to compare the outputs.

Itās still underfitting slightly, which may be the result of using both swap column data augmentation and dropout, so Iāll have to explore lowering the dropout. Eventually I need to have an ablation study.

The other thing Iād like to do is compare it to the original VAE which outputs 1-hot encodings and uses MSE for the entire output vector. I think to do so I just need to modify the loss such that the categoricals are output in that form. Iām curious to see if explicit category embeddings and cross entropy loss help make a better fitting model.