Idea on categorical embeddings

I am thinking over the ways to make the entity embedding of categorical variables more accurate to get further accuracy gains in the cases where I have limited training data. Can we train the neural net to overfit on train data set and get the embeddings we got from the previous training as the new initialization while we are actually training for the generalized model(where we try to make val loss and train loss move together). Will it represent the information in a better way or am I leaking the training data in the process before hand…?
@jeremy , knowing your thoughts on this experiment will be really helpful , before I dive in to the experimentation.