Python error - lesson 4


I am getting an error while following code for lesson 4 and not sure if it is python 2 error as I am still on python 2.7
Here is my code: please let me know what am I doing wrong?
movies = pd.read_csv(data_path+‘movies.csv’)
ratings = pd.read_csv(data_path+‘ratings.csv’)
movie_names = pd.read_csv(data_path+‘movies.csv’).set_index(‘movie_id’)[‘movie’].to_dict()

users = ratings.user_id.unique()
movies = ratings.movie_id.unique()

userid2idx = {o:i for i,o in enumerate(users)}
movieid2idx = {o:i for i,o in enumerate(movies)}


KeyError Traceback (most recent call last)
in ()
----> 1 ratings.movieId = ratings.movie_id.apply(lambda x: movieid2idx[x])
2 ratings.userId = ratings.user_id.apply(lambda x: userid2idx[x])

/Users/trinakarmakar/anaconda2/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
2290 else:
2291 values = self.asobject
-> 2292 mapped = lib.map_infer(values, f, convert=convert_dtype)
2294 if len(mapped) and isinstance(mapped[0], Series):

pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:66116)()

in (x)
----> 1 ratings.movieId = ratings.movie_id.apply(lambda x: movieid2idx[x])
2 ratings.userId = ratings.user_id.apply(lambda x: userid2idx[x])

KeyError: 1193


I noticed you are using movie_id and I am using movieId, but assuming you were able to get to that point without problem, what that error is telling you is that when the command movieid2idx[x] was executed with the key of 1193, nothing came back with a match. This shouldn’t be so I would guess you have another issue a little further up. I would suggest changing where you have movies = ratings.movie_id.unique() change that for movies = ratings.movieId.unique(). If you look at ratings, the column is movieId so unless you are doing something and are confident that you are correct in the way you have it, I would make that switch. So everywhere you have movie_id, use movieId and where you use ‘movie’ you should instead use ‘title’. If none of this helps, try breaking the step down and printing each step individually especially where you are creating movieid2idx. What happens if you try to execute movieid2idx[1193]. I would expect it will fail, but what about 1194 and 1192. Just a few things to look at.

Good luck on fixing your issue,


Thanks for the response, my data file is downloaded from a git where I got direct csv file whereas official grouplens site for data file has “::” separated files. Thats why I got the data from a random github and I think thats where the problem as data is corrupt. Regarding your observation about movieId vs movie_id and title vs movie they are just column header I am using vs what Jeremy used. I used because thats how it came in my data file and should not have any bearing on the error in data. I think my data file is wrong and I can use some help if someone points me to the source where I would get comma separated data files for this lesson. I think Jeremy doesn’t supply data, may be its in his video that I need to go thorogh one more time.
However, thanks a lot for your response.

Fixed the issue used the data from Thanks.

Great, glad you were able to get it working!