I’m stepping through the collab example and there’s this behaviour in dataloaders I don’t understand. I’m not sure how to find the mapping between title and the ids being populated in the dataloader.
I’ve done:
from fastai.tabular.all import *
from fastai.collab import *
path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
usecols=(0,1,2), names=['user','movie','rating'])
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1',
usecols=(0,1), names=('movie','title'), header=None)
ratings = ratings.merge(movies)
user movie rating timestamp title
0 196 242 3 881250949 Kolya (1996)
1 63 242 3 875747190 Kolya (1996)
2 226 242 5 883888671 Kolya (1996)
3 154 242 3 879138235 Kolya (1996)
4 306 242 5 876503793 Kolya (1996)
# Passing title as item_name. So I'm assuming that the movie names are gonna be converted to ids.
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
But now if I do:
for it in dls.train_ds.dataloaders()[0]:
print(it[0][0],it[1][0])
break
# I get
tensor([655, 684]) tensor([2], dtype=torch.int8)
Here I’m assuming user is 655, title is 684 and rating is 2.
How to I confirm this and find which movie is mapped to 684