Just a quick update on this : I upgraded to Keras 2.0.6 and ran the numbers with TensorFlow as the backend, instead of Theano. While I saw some small differences, the spread was much wider when I fiddled with the weight regularizers or changed the optimizer.
For example, with TF, in the Bias section, I can get all the way down to 0.973
simply by using RMSProp instead of Adam and by lowering the weight regularizers to 1e-9
. I could run more epochs as well but it bottoms out around 18~20 iterations.
Hereās the part describing the model (had to modify it for Keras 2.0.6) :
x = keras.layers.dot([u,m], axes=2, normalize=False)
x = Flatten()(x)
x = keras.layers.add([x, ub])
x = keras.layers.add([x, mb])
model = Model([user_in, movie_in], x)
model.compile(optimizer=keras.optimizers.TFOptimizer(tf.train.RMSPropOptimizer(0.001)), loss='mse')
and the output when I fit the model :
model.fit([trn.userId, trn.movieId], trn.rating, batch_size=64, epochs=18,
validation_data=([val.userId, val.movieId], val.rating))
Train on 79906 samples, validate on 20098 samples
Epoch 1/18
79906/79906 [==============================] - 8s - loss: 11.7509 - val_loss: 9.2466
Epoch 2/18
79906/79906 [==============================] - 7s - loss: 5.8543 - val_loss: 3.8248
Epoch 3/18
79906/79906 [==============================] - 8s - loss: 2.8041 - val_loss: 2.3294
Epoch 4/18
79906/79906 [==============================] - 7s - loss: 1.8351 - val_loss: 1.7454
Epoch 5/18
79906/79906 [==============================] - 7s - loss: 1.4134 - val_loss: 1.4586
Epoch 6/18
79906/79906 [==============================] - 8s - loss: 1.1883 - val_loss: 1.2983
Epoch 7/18
79906/79906 [==============================] - 8s - loss: 1.0511 - val_loss: 1.2000
[.....]
Epoch 15/18
79906/79906 [==============================] - 7s - loss: 0.6468 - val_loss: 0.9850
Epoch 16/18
79906/79906 [==============================] - 7s - loss: 0.6206 - val_loss: 0.9790
Epoch 17/18
79906/79906 [==============================] - 8s - loss: 0.5953 - val_loss: 0.9763
Epoch 18/18
79906/79906 [==============================] - 7s - loss: 0.5730 - val_loss: 0.9730
So itās still not explaining the differences between the"original" notebook and the class, but I believe that using Adam for this type of application is not really the best optimizer.
For the Dot Product section, while I kept Adam
as the optimizer, I get much better results without specifying any weight regularizers :
u = Embedding(n_users, n_factors, input_length=1)(user_in)
m = Embedding(n_movies, n_factors, input_length=1)(movie_in)
which gets me (only showing the first 6 epochs) :
Train on 80307 samples, validate on 19697 samples
Epoch 1/10
80307/80307 [==============================] - 5s - loss: 10.9750 - val_loss: 4.4705
Epoch 2/10
80307/80307 [==============================] - 4s - loss: 2.5244 - val_loss: 1.8511
Epoch 3/10
80307/80307 [==============================] - 4s - loss: 1.2538 - val_loss: 1.4383
Epoch 4/10
80307/80307 [==============================] - 4s - loss: 0.9073 - val_loss: 1.3123
Epoch 5/10
80307/80307 [==============================] - 4s - loss: 0.7445 - val_loss: 1.2630
Epoch 6/10
80307/80307 [==============================] - 4s - loss: 0.6430 - val_loss: 1.2449
In comparison, using a weight regularizer of 1e-5
, as is done in the class, gets me around 1.41
.
So I donāt think thereās an issue per say with the backend youāre using unless thereās a bug in your specific version ; however based on which weight regularizer coefficient and which optimizer youāre using there seems to be a fairly big impact for a specific model.
As a side note : adding the weight regularizers takes about twice as long for each epoch.
HTH,
N.