How to find title to id mapping in dataloaders from fastai.collab tutorial

chatuur · January 12, 2022, 2:23pm

I’m stepping through the collab example and there’s this behaviour in dataloaders I don’t understand. I’m not sure how to find the mapping between title and the ids being populated in the dataloader.

I’ve done:

from fastai.tabular.all import *
from fastai.collab import *

path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      usecols=(0,1,2), names=['user','movie','rating'])
movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=('movie','title'), header=None)

ratings = ratings.merge(movies)

user 	movie 	rating 	timestamp 	title
0 	196 	242 	3 	881250949 	Kolya (1996)
1 	63 	242 	3 	875747190 	Kolya (1996)
2 	226 	242 	5 	883888671 	Kolya (1996)
3 	154 	242 	3 	879138235 	Kolya (1996)
4 	306 	242 	5 	876503793 	Kolya (1996)



# Passing title as item_name. So I'm assuming that the movie names are gonna be converted to ids. 
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

But now if I do:

for it in dls.train_ds.dataloaders()[0]:
    print(it[0][0],it[1][0])
    break
# I get
tensor([655, 684]) tensor([2], dtype=torch.int8)

Here I’m assuming user is 655, title is 684 and rating is 2.
How to I confirm this and find which movie is mapped to 684