I’m applying what the course teaches in videos 4 and 5 to an ensemble recommender that uses data provided by LastFM about the listening habits of 350K users.
The data is in the form of (userid, artistid, playcount) tuples.
You can find the notebook here: https://github.com/mistakenot/lasftm-ensemble-recommender/blob/master/lastfm.ipynb
I’ve gotten as far as creating the basic dot product based model and training it to try to make some basic predictions about what a user may like. The version without bias is incapable of giving answers that aren’t orders of magnitude out of whack. The version with bias is at least in the right ball park but still not great.
I’d really appreciate it if anyone could have a quick glance over the notebook and give it a quick critique of how I’m doing so far and how I could improve it.
An issue might be the sparsity of the data - there are a lot more NaNs in the preference grid than there are in the movie dataset to reflect a much more varied choice of music artist to listen to. Is there any better way of dealing with this data?
Furthermore I’d be curious to hear any opinions about how I normalized the play count data - specifically I changed each artist play count to a decimal that represented the fraction of all of that users plays that the artist represented. Could there be a better way?
Thanks for anyone’s help.