Lesson 4 In-Class Discussion

lgvaz · November 21, 2017, 3:34am

One hot encoding says every class have NO correlation with each other. But that’s not true for day of the week, sunday has some correlation with saturday, since in both days we don’t work (or work a little bit less). So embedding allows to express this kind of correlations.

helena · November 21, 2017, 3:37am

rich representations - brilliant!

neovaldivia · November 21, 2017, 3:37am

but in categorical those are strings…

zaoyang · November 21, 2017, 3:37am

Why is 0.04 used as dropout for early layer. I thought 0.25 or 0.5 was standard? Why is [0.001, 0.01] dropout used for later layers?

zaoyang · November 21, 2017, 3:38am

Is there a reason why [1000, 500] is used for number of activations? Is there a heuristic to know what number to use?

memetzgz · November 21, 2017, 3:38am

Seems like the embedding matrices are the inverse of dimensionality reduction, like opposite of principal components

Ray2 · November 21, 2017, 3:39am

I would recommend to look into the “word2vec” paper / blog. It explains and demonstrates embedding quite well. By all mean, it is a great way to reduce dimension and improve the efficiency of representing information.

ezequiel · November 21, 2017, 3:39am

I don’t know if I understand, you’re saying to pass for example if the year is 2014 -considered as a category- just give 2014 to the neural net, without one hot encoding or embedding?

pramod.srinivasan · November 21, 2017, 3:41am

I think Chris Moody’s blog has a great explanation for word2vec. Crudely speaking one can extend the “word” to any “categorical variable”.

Link : multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/

aymenim · November 21, 2017, 3:41am

I was wondering can the data from embeding matrix after training may be used as some kind of clustering.

nafizh · November 21, 2017, 3:42am

Yeah, but I got it now, it doesn’t make sense for Neural Nets I guess.

pete.condon · November 21, 2017, 3:44am

Yep, that’s exactly what happens with something like word2vec … I suspect we’ll get onto it later today.

atulkum · November 21, 2017, 3:45am

Why not take the output dimension as ceil(log2(cardinality)) instead of min(50, cardinality/2) ??
We can represent days of week in 3 bit numbers.

kalps00 · November 21, 2017, 3:45am

If there is a hierarchy in the categorical variables, what are some improvements one can make when doing the embedding?

neovaldivia · November 21, 2017, 3:45am

@jeremy nailed the explanation!!

with one-hot encoder as a matrix prod with the embedding matrix

dinkar · November 21, 2017, 3:48am

What is the process behind creating a custom function like datepart(). Will teaching this be a part of future lessons?

arjunrajkumar · November 21, 2017, 3:49am

Why choose 1000 and 500 as the number of activation in the layers? And why 2 layers?

charlielee · November 21, 2017, 3:49am

try checking out the fastai source code. For this example, check out structured.py under the #add_datepart method.

yinterian · November 21, 2017, 3:49am

You can read the code in the library. I don’t think that it will be covered it here.

zaoyang · November 21, 2017, 3:49am

I thought embeddings were more like One Hot Encoding, which is a permanent label that doesn’t change. If SGD changes the embedding, what’s the point? Isn’t the purpose of embedding just an encoding why constantly change this encoding of “Sunday”?