@jeremy Hello Jeremy,
First of all thanks a lot for this series of classes on deep learning which actually supported me in changing the course of my career.
I have a quick comment on this lesson 4 regarding the illustration of collaborative filtering in Excel. You introduce the importance of having a bias term in the user and movies embeddings in order to take into account user or movie specifics. By adding the bias term, you show that the RMSE of the small dataset in Excel drops from 0.39 to 0.32, and conclude that bias is a useful addition to these matrices.
I do not believe the conclusion nor the reasoning is correct. You essentially increase your model’s parameters size by 20% (going from 5 to 6 rows for movies and users) and observe a decrease of the RMSE on your training set. This would be the case for any other model. In fact, I am wondering if you would have seen a different effect if you would have just added another row to the weight matrices (not in the bias).
My intuition is that the “bias” effect can be fully captured by just having on average lower/higher weights for a given movie or user and the same effect can still be achieved by simple matrix multiplication.
This may be harder for the model to learn (and hence the benefit of bias), but I find the process to reach your conclusion arguable.
Thanks again for the course, I’ve been following and re-watching them for the past years!