I’ve been trying to wrap my head around how the neural network implementation of collaborative filtering.Is there a significant difference between the two algorithms, other than a) we now have regularization, b) we are using stochastic gradient descent, versus from the spread spreadsheet (finite differencing)?

It looks like the number of weights is the same: latent_factor X n_users + latent_factor X n_movies. Unless I am missing something.

Also, what is finite differencing exactly? Is it basically, instead of computing gradients with derivatives, we compute them based on seeing whether increases or decreases in a weight change the objective function?

@ben.bowles sorry on re-reading your question, I mis-spoke. The ‘dot product’ and ‘bias’ sections in the notebook show methods that are very much like standard collaborative filtering methods. The ‘neural net’ section is not like them however. If you run that section and do a ‘nn.summary()’ you’ll see that the model has many more parameters - since each dense layer has it’s set of weights.

The core idea of taking random movie latent factors and user latent factors is same in all techniques of collaborative filtering except the optimizer methods varies like GD, SGD, ALS etc right ?