Lesson 4 In-Class Discussion

Correct!

Does Jeremy have any rules of thumb on how big the embeddings should be?

1 Like

Is there a known rule of thumb to choose the feature dimension of embedding matrix?

2 Likes

embeddings are encoding just not one-hot

1 Like

He is using a formula but it is not clear.

1 Like
5 Likes

How is embedding better than binary encoding,or one hot encoding for category @jeremy?

advice on when to use embeddings over one hot encodings?

Not straightforward answer. He is probably going to try to explain it in the next few lectures.

3 Likes

Looks like the entire class walked into this question. great planning @jeremy !!

1 Like

When you have a categorical variable with very high cardinality, you’re better off using an embedding since one-hot encoding would produce a very sparse matrix.

Infact, the embedding matrix would produce an embedding matrix in which the geometric distance between each row ~ semantic similarity between items - a by product in this approach!

11 Likes

2^x >= cardinality

Do we have to build a embedding matrix for every cat_var?

Why are we creating embeddings instead of just passing each row(observation) into the neural net?

The data are normalized first. Probably subtracted the minimum year to make them start from 0.

good point

In a lot of cases it improves performance, beacause a dense vector encodes relations between the categories that aren’t present in a one hot encoding.
For example, and embedding of words could put two words: “cats”, “lion” near, meaning the embeddings of that words are near, but in one hot encoding you don’t have that.
Also, if you have a lot of categories, you can use embeddings to reduce dimension

5 Likes

You’re right.

No, if you have something like a binary flag then you wouldn’t want to have an embedding for it because the data can’t be represented more succinctly

I am not really talking about one-hot encoding, but just passing in the rows in our dataframe as single instances into the neural net.