Implementation of collaborative filtering for implicit feedback

bny6613 · May 27, 2018, 10:29pm

Hi, i am trying to implement collaborative filtering for implicit feedback dataset based on Lesson 5 https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb and this code https://github.com/maciejkula/spotlight/blob/master/spotlight/factorization/implicit.py. The issue is that the loss is wiggling around 1.0 and not going down. I tried different learning rates, weight decay, etc. I must be doing something wrong or there is a mistake somewhere in the my code. Here is the full code:

from fastai.imports import *
from fastai.column_data import *

class ScaledEmbedding(nn.Embedding):
    def reset_parameters(self):
        self.weight.data.normal_(0, 1.0 / self.embedding_dim)
        if self.padding_idx is not None:
            self.weight.data[self.padding_idx].fill_(0)
class ZeroEmbedding(nn.Embedding):
    def reset_parameters(self):
        self.weight.data.zero_()
        if self.padding_idx is not None:
            self.weight.data[self.padding_idx].fill_(0)

class ImplicitFeedBackModel(nn.Module):
    def __init__(self, n_factors, n_users, n_items):
        super().__init__()
        self.n_items = n_items
        self._random_state = np.random.RandomState()
        self.u = ScaledEmbedding(n_users, n_factors)
        self.i = ScaledEmbedding(n_items, n_factors)
        self.ub = ZeroEmbedding(n_users, 1)
        self.ib = ZeroEmbedding(n_items, 1)

    def forward(self, users, items):
        #import pdb;pdb.set_trace()
        um = self.u(users) * self.i(items)
        positive_predictions = um.sum(1) + self.ub(users).squeeze() + self.ib(items).squeeze()
        random_items = self._random_state.randint(0, self.n_items, len(users), dtype=np.int64)
        um2 = self.u(users) * self.i(V(random_items))
        negative_predictions = um2.sum(1) + self.ub(users).squeeze() + self.ib(V(random_items)).squeeze()
        return (positive_predictions, negative_predictions)

class ImplicitCollabFilterDataset(CollabFilterDataset):
    def get_model(self, n_factors):
        model = ImplicitFeedBackModel(n_factors, self.n_users, self.n_items)
        return CollabFilterModel(to_gpu(model))

    def get_learner(self, n_factors, val_idxs, bs, **kwargs):
        return ImplicitCollabFilterLearner(self.get_data(val_idxs, bs), self.get_model(n_factors), **kwargs)

class ImplicitCollabFilterLearner(Learner):
    def __init__(self, data, models, **kwargs):
        super().__init__(data, models, **kwargs)
    def _get_crit(self, data): return pointwise_loss

def pointwise_loss(x,y):
    positives_loss = 1.0 - F.sigmoid(x[0])
    negatives_loss = F.sigmoid(x[1])
    loss = positives_loss + negatives_loss
    return loss.mean()


path = 'data/ml-latest-small/ml-latest-small/'
ratings = pd.read_csv(path+'ratings.csv')
val_idxs = get_cv_idxs(len(ratings))
n_factors = 32
bs = 256
dataset = ImplicitCollabFilterDataset.from_csv(path,'ratings.csv', 'userId', 'movieId','rating')
learner = dataset.get_learner(n_factors, val_idxs, bs, opt_fn=optim.Adam)
learner.fit(1e-2, 3, cycle_len=1, cycle_mult=2)

Maybe someone already tried to implement this approach.
Please advise.
Thanks.

bny6613 · May 28, 2018, 10:45am

Ok, i figured this out. The issue was, that i returned a tuple from the forward function and in model.py we have this line of code:

if isinstance(output,tuple): output,*xtra = output

I changed return type to list and it works fine.

crcrpar · May 28, 2018, 1:33pm

Does this refer fastai/fastai/model.py?

If so, you can convert a tuple to a list by below snippet.

if isinstance(output, tuple):
    output = list(output)

bny6613 · May 28, 2018, 2:12pm

yeah, i just changed the forward method to return the list instead of tuple.

return (positive_predictions, negative_predictions)

->

return [positive_predictions, negative_predictions]

crcrpar · May 29, 2018, 1:13am

Ah, ok.

Sorry for my misunderstanding.

Replacing the line in fastai/fastai/model.py as below will work!

if isinstance(output, (list, tuple)):
    output, *xtra = output

Alix · October 9, 2018, 1:04pm

shouldn’t the loss function pointwise_loss(x,y) also take y into account ?

you seem to assume that it will always be called with an X that contains positive examples in the first half and negative ones in the second half

bny6613 · October 10, 2018, 12:57pm

Right. That is because y contains rating values, which we don’t care about in implicit feedback model.

NegatioN · February 1, 2019, 12:52pm

How did this mixture of implicit feedback for collab filtering work out @bny6613 ?
Did you do any benchmarks against pure Spotlight implicit models?

ymittal23 · April 11, 2019, 4:34pm

Hi @bny6613 I am trying to build a recommender system only with implicit features. I am having more than one implicit features like demographic details of user, personal user details and similarly I have multiple features for the item. I searched number of places but what I got is most of the people use only 1 intrinsic feature. Can you please suggest me how can I make a recommender using multi-features?

wjlgatech · July 6, 2019, 1:31am

@bny6613 Hi Nick, did you work out a solution? I am very interested to take a look at it if you can share it on place like github. Thanks Nick!

wjlgatech · July 9, 2019, 4:48pm

bny6613:

from fastai.imports import * from fastai.column_data import * class ScaledEmbedding(nn.Embedding): def reset_parameters(self): self.weight.data.normal_(0, 1.0 / self.embedding_dim) if self.padding_idx is not None: self.weight.data[self.padding_idx].fill_(0) class ZeroEmbedding(nn.Embedding): def reset_parameters(self): self.weight.data.zero_() if self.padding_idx is not None: self.weight.data[self.padding_idx].fill_(0) class ImplicitFeedBackModel(nn.Module): def init(self, n_factors, n_users, n_items): super().init() self.n_items = n_items self._random_state = np.random.RandomState() self.u = ScaledEmbedding(n_users, n_factors) self.i = ScaledEmbedding(n_items, n_factors) self.ub = ZeroEmbedding(n_users, 1) self.ib = ZeroEmbedding(n_items, 1) def forward(self, users, items): #import pdb;pdb.set_trace() um = self.u(users) * self.i(items) positive_predictions = um.sum(1) + self.ub(users).squeeze() + self.ib(items).squeeze() random_items = self._random_state.randint(0, self.n_items, len(users), dtype=np.int64) um2 = self.u(users) * self.i(V(random_items)) negative_predictions = um2.sum(1) + self.ub(users).squeeze() + self.ib(V(random_items)).squeeze() return (positive_predictions, negative_predictions) class ImplicitCollabFilterDataset(CollabFilterDataset): def get_model(self, n_factors): model = ImplicitFeedBackModel(n_factors, self.n_users, self.n_items) return CollabFilterModel(to_gpu(model)) def get_learner(self, n_factors, val_idxs, bs, **kwargs): return ImplicitCollabFilterLearner(self.get_data(val_idxs, bs), self.get_model(n_factors), **kwargs) class ImplicitCollabFilterLearner(Learner): def init(self, data, models, **kwargs): super().init(data, models, **kwargs) def _get_crit(self, data): return pointwise_loss def pointwise_loss(x,y): positives_loss = 1.0 - F.sigmoid(x[0]) negatives_loss = F.sigmoid(x[1]) loss = positives_loss + negatives_loss return loss.mean() path = ‘data/ml-latest-small/ml-latest-small/’ ratings = pd.read_csv(path+‘ratings.csv’) val_idxs = get_cv_idxs(len(ratings)) n_factors = 32 bs = 256 dataset = ImplicitCollabFilterDataset.from_csv(path,‘ratings.csv’, ‘userId’, ‘movieId’,‘rating’) learner = dataset.get_learner(n_factors, val_idxs, bs, opt_fn=optim.Adam) learner.fit(1e-2, 3, cycle_len=1, cycle_mult=2)

Thanks for sharing your solution!! Do you know how to write a script on how to recommend items(products)? i.e. recommendation(user_id = 314, n_item = 10) to return 10 recommend products for user 314, where the recommended products should NOT include those user 314 already purchased.

Christiana · July 10, 2019, 6:34am

Thanks for the information. you’ve done a great job, please post the next post for the more knowledge. i want to suggest you guys if you want to Fix Epson Printer Error Code 0xf1 so must check out the link mentioned above