Please help with collaborative filtering (lesson 4)

Hi everyone,

I’m working on a recommendation engine model and am following the approach from lesson 4.

I want to learn embedding factors for users and targets and store these in a database to take the dot product between the user and a subset of targets at test-time.

The model fit in lesson 4 just learns embeddings mapping from input to output:

user_in = Input(shape=(1,), dtype='int64', name='user_in')
u = Embedding(n_users, n_factors, input_length=1, W_regularizer=l2(reg_strength))(user_in)
target_in = Input(shape=(1,), dtype='int64', name='target_in')
m = Embedding(n_targets, n_factors, input_length=1, W_regularizer=l2(reg_strength))(target_in)
x = merge([u, m], mode='dot')
x = Flatten()(x)
model = Model([user_in, target_in], x)
model.compile(Adam(0.001), loss='binary_crossentropy', metrics=['accuracy'])
# lm.compile(optimizer=RMSprop(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])

If I add some layers on top of the flattened dot product of the embeddings, I get a much better accuracy (0.7 vs 0.34). My question is whether this is a valid way to learn better embeddings?

New model:

user_in = Input(shape=(1,), dtype='int64', name='user_in')
u = Embedding(n_users, n_factors, input_length=1, W_regularizer=l2(reg_strength))(user_in)
target_in = Input(shape=(1,), dtype='int64', name='target_in')
m = Embedding(n_targets, n_factors, input_length=1, W_regularizer=l2(reg_strength))(target_in)
x = merge([u, m], mode='dot')
x = Flatten()(x)

x = keras.layers.Dropout(0.5)(keras.layers.Dense(128, activation='relu')(x))
x = keras.layers.normalization.BatchNormalization()(x)
x = keras.layers.Dropout(0.5)(keras.layers.Dense(128, activation='relu')(x))
x = keras.layers.normalization.BatchNormalization()(x)
x = keras.layers.Dense(128, activation='relu')(x)
x = keras.layers.Dense(1, activation='softmax')(x)

model = Model([user_in, target_in], x)
model.compile(Adam(0.001), loss='binary_crossentropy', metrics=['accuracy'])

Thanks in advance :blush:

Bump.

I was tinkering with adding layers because my accuracy seems low. I’ve got around 100 million ratings from around 500k users (around 200 per user) but am only getting around 60% accuracy.

What’s puzzling me is that:

  1. My accuracy stops changing after 2 epochs and levels at around 62%. My losses (validation / train) don’t change either.

  2. I get almost no change in accuracy when I change the size of the embedding layer or the regularization strength

I would really appreciate some help with this because the fact that my accuracy isn’t moving no matter what I do is incredibly frustrating ;(

Thanks in advance!