Speeding up CF NN (lesson 4) Predictions?

markovbling · May 12, 2017, 7:06am

Hi everyone!

I’d like to put the neural net for collaborative filtering from lesson 4 into “production” to produce predictions for a user:

I know that I can do:

nn.predict([np.array([u_id]), np.array([m_id])])

…to get a prediction of what u_id’ would rate m_id.

2 Questions:

The NN currently has 2 inputs (a movie_id and a user_id) and produces a single output (rating). How can we change the architecture to receive a user_id and produce a vector of ratings for every movie_id as output?
It is useful to input a user_id and a movie_id since you can iterate over each movie in a genre and just produce predictions for that genre but calling nn.predict() on each user/movie pair is slow - how can we speed this up?

Thanks!

jeremy · May 12, 2017, 9:45pm

Just pass in an array of all the movie ids you want, rather than just one id.
Batch up a few ids and pass them all at once as arrays

markovbling · June 7, 2017, 7:52pm

Thanks Jeremy, worked perfectly

In the end, I went with the CF + bias model since I could store the latent factors in a database which made it much easier to put the model into production. In case anyone is wondering, you can do a similar thing with the embedding matrix (similar to getting multiple predictions as per Jeremy’s answer AND similar to the way we extract the biases in the lesson notebook) where you can just do something like:

u_target = np.array(range(0,n_users))
user_factorz = get_user_embedding.predict(u_target)
user_factorz = user_factorz.reshape(user_factorz.shape[0],user_factorz.shape[2])
u_f_df = pd.DataFrame(user_factorz)
u_f_df.to_csv("user_factors.csv")