@rachel @jeremy
I’m really confused on whether this is working or not. Here are the important lines cut from the notebook
ratings = pd.read_csv(path+'ratings.csv') # loads ratings.csv, which is UserUID, MovieUID, Rating, timestamp
users = ratings.userId.unique() # creates a unique listing of users
movies = ratings.movieId.unique() # creates a unique listing of movies
userid2idx = {o:i for i,o in enumerate(users)} # creates a mapping from user id to index in users
movieid2idx = {o:i for i,o in enumerate(movies)} # creates a mapping from movie id to index in movies
### CRITICAL-###
# Converts the ratings.movieId from being UID based to index based
ratings.movieId = ratings.movieId.apply(lambda x: movieid2idx[x])
# Converts the ratings.userId from being UID based to index based
ratings.userId = ratings.userId.apply(lambda x: userid2idx[x])
Then when we train:
model.fit([trn.userId, trn.movieId], trn.rating, batch_size=64, epochs=1,
validation_data=([val.userId, val.movieId], val.rating))
So we train on the indices, so any call to predict, MUST use the index.
However, I wrote this function (I’ve renamed some of the variables in my own notebook to be a bit more self explantory)
def predict( user_index = 0, movie_index = 0, movie_uid = None, user_uid = None, movie_name = None ):
if movie_name != None :
movie_uid = movie_name_to_uid[movie_name]
movie_index = movie_uid_to_idx[movie_uid]
if movie_uid != None :
movie_index = movie_uid_to_idx[movie_uid]
if user_uid != None :
user_index = user_uid_to_idx[user_uid]
result = model.predict( [np.array([user_index]), np.array([movie_index])])
# but we want to translate that into a movie id...
print ( "Best Rating for user {} on movie {} is {}".format( user_idx_to_uid[user_index], MovieIndexToName(movie_index), result[0] ))
However, when I then call my function:
predict( user_index = 0, movie_index = 27 )
predict( user_uid = 1, movie_name = "Braveheart (1995)")
predict( user_uid = 1, movie_uid = 110 )
# This represents what the notebook is really asking for
predict( user_index = 3, movie_index = 6 )
# This represents what it intended to ask for
predict( user_uid = 3, movie_uid = 6 )
I get this result:
# From the CSV ratings, file, UserID 1, with Movie ID 110 should score about a 1.0 rating.
Best Rating for user 1 on movie Braveheart (1995) is [ 2.84351969]
Best Rating for user 1 on movie Braveheart (1995) is [ 2.84351969]
Best Rating for user 1 on movie Braveheart (1995) is [ 2.84351969]
# The following result is closest to the notebook example of model.predict([np.array([3]), np.array([6])])
Best Rating for user 4 on movie Ben-Hur (1959) is [ 4.69232702]
# This was what it intended to ask
Best Rating for user 3 on movie Heat (1995) is [ 3.62401152]
Am I doing this right? Maybe the model is just not that good. I’ve been disappointed by it’s predictive powers.