I want to understand how embedding works and how to calculate them.
In the example about embeddings for states of Germany, i don’t understand how the state embedding is made?
To clarify my question, let’s make an analogy with the example about the movie rating in the Collabsection in the book - Basically we gave 5 latent features for the user ID (its embedding) and 5 features to the movie ID, and we used the data about the movie ratings to learn those latent features (with SGD as optimizer).
what im missing in the analogy is if the user ID is states of Germany, what’s the equivalent for the movie ID and the ratings?
You can take a look at the Rossmann Store Sales | Kaggle competition here. Actually, you want to predict the Rossmann sales based on some tabular data ( number of customer, open date, Competitior Distance, Promo, Promo Interval, … ) . Then, you create a model by using Entity Embedding ( mean for each categorical feature, you will create a latent vector, then stack every latent features together to have a long input features vector ).
To have the map of state as you mentioned, I think we can calculate the PCA with dimension 2 for the input embedding vector, then do some kind of clustering (or grouping the states closest to each other by calculate their Euclidean distance). Then we get, states in the same cluster is also states that is closest to each other geographically.
Thanks! that was very helpful.
Regarding PCA, it assumes gussian distribution of the samples, i wonder why the kaggle data is guassian, or how can i know that my data is guassian?
Also, what other methods for non-gussian data there are to compute principal components?