Booleans and Collaborative Filtering

All, seeking some advice on collaborative filtering problem! I’m trying to build a recommender for users with around 1,500 features they may or may not have, and so are boolean rather than numerical.

Eg, a user may select feature 1, 2 and 3, and I’d like to recommend that other users who have those also have 5,6 and 7.

My question is how I prepare the data…the tutorial based on netflix movies obviously has each movie as either watched with a rating, or absent. In my data however, I’m treated each user as having or not having each and every skill. Eg, if in my training set a user has feature 2,3 and 4, I assume they do not have every other feature (rather than them being absent).

Does that make sense? Is there a better way to approach collaborative filtering with a large number of boolean features?


If you only have Boolean values, that does not matter! The standard example of collaborative filtering uses Netflix with its rating system 1 to 5 which indicates a likelihood of being liked. The Boolean way, it is just an absolute measure.

My suggestion is - just go with a Boolean matrix and test it! The results should be on a scale between 0 and 1 and if you look at one of the lectures by Jeremy, where he used Excel to demonstrate the logic behind Collaborative Filtering, they should be even 0 or 1 (and nothing in between).

Just give it a whirl, if it does not work as expected, you can show some details and I am sure someone can offer more advise.

1 Like