I’ve been trying to wrap my head around how the neural network implementation of collaborative filtering.Is there a significant difference between the two algorithms, other than a) we now have regularization, b) we are using stochastic gradient descent, versus from the spread spreadsheet (finite differencing)?
It looks like the number of weights is the same: latent_factor X n_users + latent_factor X n_movies. Unless I am missing something.
Also, what is finite differencing exactly? Is it basically, instead of computing gradients with derivatives, we compute them based on seeing whether increases or decreases in a weight change the objective function?
Yes, that all sounds right - although note that the spreadsheet only has 5 factors, whereas the keras model has 50.
Finite differencing is the first method shown in the graddesc spreadsheet - it is exactly as you describe
Thats helpful! This makes me wonder, is the main problem with vanilla corroborative filtering just that it tends to over fit?
@ben.bowles sorry on re-reading your question, I mis-spoke. The ‘dot product’ and ‘bias’ sections in the notebook show methods that are very much like standard collaborative filtering methods. The ‘neural net’ section is not like them however. If you run that section and do a ‘nn.summary()’ you’ll see that the model has many more parameters - since each dense layer has it’s set of weights.
Thank you. I will have to look at this more closely.
@jeremy Is the dot product method also called as Alternating Least Squares(ALS) method for collaborative Filtering ?
Not exactly. That’s an alternative way to optimize it, instead of SGD.
The core idea of taking random movie latent factors and user latent factors is same in all techniques of collaborative filtering except the optimizer methods varies like GD, SGD, ALS etc right ?
Exactly right. Except for the neural net example we provided, which uses a somewhat different approach.