Lesson 4 In-Class Discussion

yinterian · November 21, 2017, 3:28am

Correct!

pete.condon · November 21, 2017, 3:28am

Does Jeremy have any rules of thumb on how big the embeddings should be?

ezequiel · November 21, 2017, 3:28am

Is there a known rule of thumb to choose the feature dimension of embedding matrix?

neovaldivia · November 21, 2017, 3:28am

embeddings are encoding just not one-hot

yinterian · November 21, 2017, 3:28am

He is using a formula but it is not clear.

KevinB · November 21, 2017, 3:28am

Avhirup · November 21, 2017, 3:29am

How is embedding better than binary encoding,or one hot encoding for category @jeremy?

beacrett · November 21, 2017, 3:29am

advice on when to use embeddings over one hot encodings?

yinterian · November 21, 2017, 3:30am

Not straightforward answer. He is probably going to try to explain it in the next few lectures.

pramod.srinivasan · November 21, 2017, 3:30am

Looks like the entire class walked into this question. great planning @jeremy !!

binga · November 21, 2017, 3:31am

When you have a categorical variable with very high cardinality, you’re better off using an embedding since one-hot encoding would produce a very sparse matrix.

Infact, the embedding matrix would produce an embedding matrix in which the geometric distance between each row ~ semantic similarity between items - a by product in this approach!

neovaldivia · November 21, 2017, 3:31am

2^x >= cardinality

Sree · November 21, 2017, 3:31am

Do we have to build a embedding matrix for every cat_var?

nafizh · November 21, 2017, 3:31am

Why are we creating embeddings instead of just passing each row(observation) into the neural net?

stathis · November 21, 2017, 3:32am

The data are normalized first. Probably subtracted the minimum year to make them start from 0.

hiromi · November 21, 2017, 3:32am

good point

ezequiel · November 21, 2017, 3:32am

In a lot of cases it improves performance, beacause a dense vector encodes relations between the categories that aren’t present in a one hot encoding.
For example, and embedding of words could put two words: “cats”, “lion” near, meaning the embeddings of that words are near, but in one hot encoding you don’t have that.
Also, if you have a lot of categories, you can use embeddings to reduce dimension

binga · November 21, 2017, 3:33am

You’re right.

pete.condon · November 21, 2017, 3:34am

No, if you have something like a binary flag then you wouldn’t want to have an embedding for it because the data can’t be represented more succinctly

nafizh · November 21, 2017, 3:34am

I am not really talking about one-hot encoding, but just passing in the rows in our dataframe as single instances into the neural net.