Collaborative filtering versus neural network implementation

ben.bowles · November 22, 2016, 12:39am

I’ve been trying to wrap my head around how the neural network implementation of collaborative filtering.Is there a significant difference between the two algorithms, other than a) we now have regularization, b) we are using stochastic gradient descent, versus from the spread spreadsheet (finite differencing)?

It looks like the number of weights is the same: latent_factor X n_users + latent_factor X n_movies. Unless I am missing something.

Also, what is finite differencing exactly? Is it basically, instead of computing gradients with derivatives, we compute them based on seeing whether increases or decreases in a weight change the objective function?

jeremy · November 22, 2016, 1:00am

Yes, that all sounds right - although note that the spreadsheet only has 5 factors, whereas the keras model has 50.

Finite differencing is the first method shown in the graddesc spreadsheet - it is exactly as you describe

ben.bowles · November 22, 2016, 1:04am

Thats helpful! This makes me wonder, is the main problem with vanilla corroborative filtering just that it tends to over fit?

jeremy · November 22, 2016, 2:00am

@ben.bowles sorry on re-reading your question, I mis-spoke. The ‘dot product’ and ‘bias’ sections in the notebook show methods that are very much like standard collaborative filtering methods. The ‘neural net’ section is not like them however. If you run that section and do a ‘nn.summary()’ you’ll see that the model has many more parameters - since each dense layer has it’s set of weights.

ben.bowles · November 22, 2016, 7:44pm

Thank you. I will have to look at this more closely.

janardhanp22 · November 23, 2016, 12:10am

@jeremy Is the dot product method also called as Alternating Least Squares(ALS) method for collaborative Filtering ?

jeremy · November 23, 2016, 12:11am

Not exactly. That’s an alternative way to optimize it, instead of SGD.

janardhanp22 · November 23, 2016, 12:14am

The core idea of taking random movie latent factors and user latent factors is same in all techniques of collaborative filtering except the optimizer methods varies like GD, SGD, ALS etc right ?

jeremy · November 23, 2016, 12:38am

Exactly right. Except for the neural net example we provided, which uses a somewhat different approach.