Wiki: Lesson 5

datafool · October 6, 2018, 2:52am

What does importance of each movie review in this statement means? As movie reviews are represented by ratings, I would have confused between the two, but would like to get more clarity here

Lator · October 8, 2018, 7:13pm

As for this 0.5 padding, I think it is a great example of @jeremy at his best. When he told the story, everything is looking so natural and straightforward. But from time to time this kind of magic constants pop-ups. He says it could be arbitrary. In any way, if we need a number here we will need to choose some, he says. Let it be 3, or 0.5 or whatever, it is not important, he says. And voilà we’ve surpassed the state of the art (again). But when you try to reimplement it by yourself - you quickly realize that any other values - lead to worse results. And you have no idea how this exact one was chosen. And how to chose another one in your own specific task. And I think this is exactly the reason why @jeremy was a kaggle champion so many times - because of his ability to discover such magic numbers.

Lator · October 8, 2018, 7:45pm

Talking about new users and new movies, we are mostly talking about a cold start problem ( https://en.wikipedia.org/wiki/Cold_start_(computing) ). To not repeat myself here is the link to answer to a similar question after prev. year (2017) version of the course.
Making predictions with collaborative filtering

jcatanza · October 8, 2018, 8:22pm

Thanks Lator

TeresaChr · October 10, 2018, 1:06pm

Hi,

in 1:23:00 on this lesson, when you say nu+nm(written in dark green), should it be nu_factors + nm_factors, which here happened to be the same value?
Thank you very much!

jakelong · October 17, 2018, 1:59pm

I’m having trouble understanding how do we update the elements in embedding matrices for batch gradient descent.

Let’s say I have randomly initialize an embedding matrix for day of week as a 7x4 matrix. I then use SGD to update my weights and this embedding matrix (updating my NN weights after every training example). My first training example has a Tuesday, then I replace that Tuesday with an corresponding embedding vector, and feed it to my NN, then during backprop, I allow the embedding vector of Tuesday to be updated as well.

Then how do I update the embedding matrix if I want to use batch gradient descent? Since batch GD only update the weights after a certain number of training examples have been fed to my NN, say after 7 training examples, each with a different day of week. After backprop, which embedding vectors would be update?

borowis · October 29, 2018, 9:19am

yes, that’s true, already discussed and answered above in this thread Wiki: Lesson 5

borowis · November 4, 2018, 8:26am

From the perspective of an embedding matrix it doesn’t matter if the embeddings are updated after each training example (online learning) or after the minibatch (sgd). It’s no different from how layer weights are updated.