Implicit Collaborative Filtering - Approach for Adding Negative Class


(Graham McAlister) #1

Looking to use the fast.ai library for a collaborative filtering problem w/ implicit rankings. All I have are a set of users and a set of searches for items so each ranking is a 1 and there are no zeros. What’s the appropriate way to add zeros to this data set for training?

FWIW the data set is large and would have ~5B cells for a users by items ranking table like Jeremy made in the lessons. Obviously that isn’t feasible to load into memory so I’m thinking of randomly sampling items that users didn’t search for to add to the DataFrame before it’s transformed into a DataBunch.

Does this approach make sense? Is there something glaringly obvious that I’m missing? Apologies if so, trying to learn just enough deep learning at the moment to tackle this particular problem :slight_smile: