Lesson 4 In-Class Discussion

One hot encoding says every class have NO correlation with each other. But that’s not true for day of the week, sunday has some correlation with saturday, since in both days we don’t work (or work a little bit less). So embedding allows to express this kind of correlations.

12 Likes

rich representations - brilliant!

3 Likes

but in categorical those are strings…

1 Like

Why is 0.04 used as dropout for early layer. I thought 0.25 or 0.5 was standard? Why is [0.001, 0.01] dropout used for later layers?

1 Like

Is there a reason why [1000, 500] is used for number of activations? Is there a heuristic to know what number to use?

3 Likes

Seems like the embedding matrices are the inverse of dimensionality reduction, like opposite of principal components

3 Likes

I would recommend to look into the “word2vec” paper / blog. It explains and demonstrates embedding quite well. By all mean, it is a great way to reduce dimension and improve the efficiency of representing information.

4 Likes

I don’t know if I understand, you’re saying to pass for example if the year is 2014 -considered as a category- just give 2014 to the neural net, without one hot encoding or embedding?

I think Chris Moody’s blog has a great explanation for word2vec. Crudely speaking one can extend the “word” to any “categorical variable”.

Link : multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/

3 Likes

I was wondering can the data from embeding matrix after training may be used as some kind of clustering.

Yeah, but I got it now, it doesn’t make sense for Neural Nets I guess.

1 Like

Yep, that’s exactly what happens with something like word2vec … I suspect we’ll get onto it later today.

Why not take the output dimension as ceil(log2(cardinality)) instead of min(50, cardinality/2) ??
We can represent days of week in 3 bit numbers.

If there is a hierarchy in the categorical variables, what are some improvements one can make when doing the embedding?

@jeremy nailed the explanation!!

with one-hot encoder as a matrix prod with the embedding matrix

4 Likes

What is the process behind creating a custom function like datepart(). Will teaching this be a part of future lessons?

Why choose 1000 and 500 as the number of activation in the layers? And why 2 layers?

try checking out the fastai source code. For this example, check out structured.py under the #add_datepart method.

2 Likes

You can read the code in the library. I don’t think that it will be covered it here.

2 Likes

I thought embeddings were more like One Hot Encoding, which is a permanent label that doesn’t change. If SGD changes the embedding, what’s the point? Isn’t the purpose of embedding just an encoding why constantly change this encoding of “Sunday”?