Correct!
Does Jeremy have any rules of thumb on how big the embeddings should be?
Is there a known rule of thumb to choose the feature dimension of embedding matrix?
embeddings are encoding just not one-hot
He is using a formula but it is not clear.
advice on when to use embeddings over one hot encodings?
Not straightforward answer. He is probably going to try to explain it in the next few lectures.
When you have a categorical variable with very high cardinality, youâre better off using an embedding since one-hot encoding would produce a very sparse matrix.
Infact, the embedding matrix would produce an embedding matrix in which the geometric distance between each row ~ semantic similarity between items - a by product in this approach!
2^x >= cardinality
Do we have to build a embedding matrix for every cat_var?
Why are we creating embeddings instead of just passing each row(observation) into the neural net?
The data are normalized first. Probably subtracted the minimum year to make them start from 0.
good point
In a lot of cases it improves performance, beacause a dense vector encodes relations between the categories that arenât present in a one hot encoding.
For example, and embedding of words could put two words: âcatsâ, âlionâ near, meaning the embeddings of that words are near, but in one hot encoding you donât have that.
Also, if you have a lot of categories, you can use embeddings to reduce dimension
Youâre right.
No, if you have something like a binary flag then you wouldnât want to have an embedding for it because the data canât be represented more succinctly
I am not really talking about one-hot encoding, but just passing in the rows in our dataframe as single instances into the neural net.