Wiki: Lesson 4

syouhei3 · December 28, 2018, 2:55pm

Hi, I have a question.

There are three same code bellow.
Sample & All & Test.

m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars),
0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)
lr = 1e-3

And each time, the result is different. What is this mean?
In this lesson video, Mr.Jeremy didn’t mention about this, I think.
Are there any other lesson about this?

Thank you.

amarschn · January 15, 2019, 3:45am

Embeddings have remained a little unclear to me. Does anyone have any recommendations for blog posts or articles with further explanation of how they work? Specifically I am confused about why the values within the embedding for a given variable change. I am probably thinking about it incorrectly, but my current understanding is this: the categorical inputs to a model are changed from a sparse matrix (one hot encoding) to an embedding, which is all random numbers. These random numbers will change over time as the model is trained. Shouldn’t the input variables into the model not change? I would expect the weights in a given layer will change, but not the inputs themselves. Any help would be appreciated.

Thanks,
Drew

wyquek · January 15, 2019, 6:17am

I think the inputs don’t change, but their embeddings do.
as an example, say you have a categorical variable Species:[Dog, Cat, Wolf, Pig], and two continuous variables weight and height.

You have a model that takes in 3 inputs: Species, weight, height

First we randomly initialize their embeddings

Dog: [0.032, 0.02. -0.053]
Cat: [0.07, 0.02. -0.093]
Wolf: [0.029, 0.02. -0.053]
Pig: [0.01, 0.02. -0.037]

When you have a pig, your inputs to the model will be
[0.01, 0.02. -0.037], 100 kg, 60cm

When you have a cat, your inputs to the model will be
[0.07, 0.02. -0.093], 3kg, 10cm

In the beginning, the embeddings will be bad, as with training, they will start to represent certain characteristics of these animals.

After some training your model, their embeddings become
Dog: [0.9, 0.72, 0.83]
Cat: [0.07, 0.85. 0.93]
Wolf: [0.9, -0.5. 0.77]
Pig: [0.1, 0.66. -0.4]

If I were to guess, I think the embeddings represent [genetic proximity, aggression, noise-making]

so that same pig will now be
[0.1, 0.66. -0.4], 100 kg, 60cm

and cat will now be
[0.07, 0.85. 0.93], 3kg, 10cm

The cat remains a cat, and the pig remains a pig. But their embedddings have changed to better reflect certain characteristics

gautik · January 15, 2019, 3:28pm

Does embeddings some how learn interactions between multiple categorical variables?

Wiki: Lesson 4

m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars), 0.04, 1, [1000,500], [0.001,0.01], y_range=y_range) lr = 1e-3

m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars),
0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)
lr = 1e-3