Has anyone had success using embeddings for categorical variables (such as rossman notebook, lesson 14)?

I have been testing the approach on a predictive problem at work with 56 continuous and 8 categorical variables (with a total of 68 categories), but haven’t seen any improvement over just using XGBoost with one hot encoded variables. I’ve tested a ton of different architectures (# layers, dropout %, BN) and the results all tend to be the same.

My theory is that the continuous variables in my problem are providing most of the predictive power and that the slight reduction in dimensions for my categorical variables doesn’t do much. I was a little discouraged that this approach didn’t work here, so was hoping other people might have other success stories.