Collaborative Filtering without Ratings

If I am trying to develop a Collaborative Filtering (CF) model without any ratings/scores, I can infer the ratings/scores based on some sort of observed behavior and score it.

For the sake of brevity, let’s say for each second a user stays on a page, it is awarded half of a point up to a maximum of 5 points. We can model this as:

score(user, page) = 0.5 * time_spent_on_webpage, where 1 <= time_spent_on_webpage <= 5

And so we can use this trivial model to populate our utility matrix. Afterward, we go and apply collaborative filtering technique to decompose the utility matrix into 2 lower rank matrices.

The last step where we factorize the utility matrix into two smaller rank matrices, we’ll call the matrices A and B respectively. So effectively, what we have done here is we’ve created a new model that will predict the time a user will spend on a given webpage.

Is the understanding here correct?

I assume you are referring to Lesson 4 of Practical Deep Learning for Coders. Your understanding seems right to me. However, I do not recall any reference to “decomposing the utility matrix”, though finding the two embeddings via SGD can be formalized as this.

So effectively, what we have done here is we’ve created a new model that will predict the time a user will spend on a given webpage.

…given the times a user has spent on some of the other pages.

One point is that you are not required to limit the rating to [.5,5]. You could use the actual number of minutes, perhaps with an upper cap to limit outliers. There’s a spot in the Lesson 4 where Jeremy maps sigmoid into the range of the ratings - you would have to adjust that calculation. Otherwise, Lesson 4 applies.

Hey Larry,
Were you able to get some results on this? I am attempting to do something similar with ‘bought’ or ‘did not buy’ data [0,1] (mostly to extract interesting embeddings). I believe your intuition is correct as long as the y_range is adjusted accordingly.

The general approach I have is:

df_filt = df[['user_id','product_id','purchase_binary']]
data = CollabDataBunch.from_df(df_filt, seed=42)
y_range = [-.01,1.1]
learn = collab_learner(data, n_factors=5, y_range=y_range, wd = 0.1)
learn.fit(3, 1e-1, wd=0.1)

I’m wondering if you got some interesting results?

Im struggling with the lr finder, as it keeps showing decreasing loss even at 1e+00.

But otherwise getting validation loss of 0.0103 which seems promising.

Bump. Very curious to see a fastai implicit feedback implementation if you can share the notebook!

Juan Pablo, I tried your method (I removed the ratings data and created a new column “watched” by assigning a 0 for not watched and a 1 for watched for any given user-title combination) on the Movie Lens dataset. The code runs with no errors.

My main goal is to see if the embeddings for the movies make sense, as in does it do a good job of identifying movies that are like any given movie?

As an example, I generated a list of top 30 movies that are like “Silence of the Lambs” as in the book using the ratings data. I also did the same without the ratings data (used watched columns instead as explained above) and saw that only 3 of the movies generated in the list of 30 closest movies were in the original list. Which is not giving me much confidence as to the effectiveness of this approach.

So I am wondering if:

  1. My approach is erroneous
  2. Using embeddings without ratings actually yields any usable information for creating neighbourhoods (movies similar to a given movie)

Does anyone have any insight on this?

Maybe this has to do with the loss function, assuming you are still using MSE? In the case of a binary outcome, I think you should use a cross entropy loss.