I’m looking for clarification as to why we use the dot product as a “similarity” metric for two vectors with collaborative filtering. For example, imagine two users, both with a metric of 0.5. The dot product will be 0.25. They have the same “similarity” rank as two users with values of 0.25 and 1, which are quite far apart. So why use the dot product?

I had the same question when I tried to understand zero shot learning with CLIP. For my understanding the dot product of normalized (length 1) vectors is the same as the cosine similarity of the two vectors.

I think you should take an example of a vector rather than a scalar. A scalar is just a point so it doesn’t have any direction or length for comparing.

For example, I add another dimension to your example (which value is 1) . so you will compare 2 set of 2 vectors: [0.5, 1] and [0.5, 1] versus [0.25, 1] and [1, 1]

For comparing, you need to normalize 2 vectors (2 vectors must have the same length).
[0.5 , 1 ] becomes [0.44 , 0.89]
[0.25, 1] becomes [0.24 , 0.97]
[1, 1] becomes [0.71, 0.71]

=> Dot product between [0.5, 1] and [0.5, 1] = 0.9857
=> Dot product between [0.25, 1] and [1, 1] =0.85547

! [0.5, 1] and [0.5, 1] more similar than [0.25, 1] and [1, 1]

For intuition, as @florianl mentioned, it is link cosine similarity, means you project one vector to other vector direction and multiply its length.

But in a collaborative filtering the product is not of 2 users, but of an user and an item.
So 0.5 in the user embeddings is how much an user “wants” that feature, and 0.5 in the item/movie embeddings is how much that movie has that feature.