Collaborative Filtering without Ratings

If I am trying to develop a Collaborative Filtering (CF) model without any ratings/scores, I can infer the ratings/scores based on some sort of observed behavior and score it.

For the sake of brevity, let’s say for each second a user stays on a page, it is awarded half of a point up to a maximum of 5 points. We can model this as:

score(user, page) = 0.5 * time_spent_on_webpage, where 1 <= time_spent_on_webpage <= 5

And so we can use this trivial model to populate our utility matrix. Afterward, we go and apply collaborative filtering technique to decompose the utility matrix into 2 lower rank matrices.

The last step where we factorize the utility matrix into two smaller rank matrices, we’ll call the matrices A and B respectively. So effectively, what we have done here is we’ve created a new model that will predict the time a user will spend on a given webpage.

Is the understanding here correct?

I assume you are referring to Lesson 4 of Practical Deep Learning for Coders. Your understanding seems right to me. However, I do not recall any reference to “decomposing the utility matrix”, though finding the two embeddings via SGD can be formalized as this.

So effectively, what we have done here is we’ve created a new model that will predict the time a user will spend on a given webpage.

…given the times a user has spent on some of the other pages.

One point is that you are not required to limit the rating to [.5,5]. You could use the actual number of minutes, perhaps with an upper cap to limit outliers. There’s a spot in the Lesson 4 where Jeremy maps sigmoid into the range of the ratings - you would have to adjust that calculation. Otherwise, Lesson 4 applies.

Hey Larry,
Were you able to get some results on this? I am attempting to do something similar with ‘bought’ or ‘did not buy’ data [0,1] (mostly to extract interesting embeddings). I believe your intuition is correct as long as the y_range is adjusted accordingly.

The general approach I have is:

df_filt = df[['user_id','product_id','purchase_binary']]
data = CollabDataBunch.from_df(df_filt, seed=42)
y_range = [-.01,1.1]
learn = collab_learner(data, n_factors=5, y_range=y_range, wd = 0.1)
learn.fit(3, 1e-1, wd=0.1)

I’m wondering if you got some interesting results?

Im struggling with the lr finder, as it keeps showing decreasing loss even at 1e+00.

But otherwise getting validation loss of 0.0103 which seems promising.

Bump. Very curious to see a fastai implicit feedback implementation if you can share the notebook!