
I would like to know how or is it possible to do the similar calculation for the collaborative filtering for 3 or more variables in excel. Will it be just dot product of 3 matrices or there is something else to it?
How gradient descend works for 3 or more variables (like it was mentioned in the lecture that apart from product and user we have something else). 
I want to make sure if the gradient descent that we are calculating in the example is calculated for this matrix which cannot be represented on the pictures from earlier lecture 3(aroud 1:20h mark) where we had this 2 dimensional axis drawing and Jeremy was showing how the learning rate was affecting search for minima (overshooting, small hops), because that one was for 2x1 vector? Just not sure how should I understand this according to that picture.

How one pixel in the example in lesson 4 is predicting the whole number? Should we sum probabilities or average them to get the number from all the pixels on that image?

Iām not sure if I understand the purpose of sigmoid function. Earlier in the lecture Jeremy mentions that the output means the probabilities for the class. However with the sigmoid function last vector is converted to numbers in range 05, so they are not really probabilities. What they are then and how it compares to image recognition output?
1 Like