Unable to visualize embedding distances

shay1309 · December 28, 2020, 2:13pm

I’m using the out of the box fast ai collab learner

learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5)) .

When I try to visualize the embedding distance:
g = ratings.groupby('title')['rating'].count()

top_movies = g.sort_values(ascending=False).index.values[:1000]

top_idxs = tensor([learn.dls.classes['title'].o2i[m] for m in top_movies])

movie_w = learn.model.i_weight[top_idxs].cpu().detach()

movie_pca = movie_w.pca(3)

fac0,fac1,fac2 = movie_pca.t()

idxs = list(range(50))

X = fac0[idxs]

Y = fac2[idxs]

plt.figure(figsize=(12,12))

plt.scatter(X, Y)

for i, x, y in zip(top_movies[idxs], X, Y):

plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)

plt.show()

and I am faced with the issue as shown in the picture …

I changed the original code that was given to be able to support the model version (original code) :
g = ratings.groupby(‘title’)[‘rating’].count()
top_movies = g.sort_values(ascending=False).index.values[:1000]
top_idxs = tensor([learn.dls.classes[‘title’].o2i[m] for m in top_movies])
movie_w = learn.model.movie_factors[top_idxs].cpu().detach()
movie_pca = movie_w.pca(3)
fac0,fac1,fac2 = movie_pca.t()
idxs = list(range(50))
X = fac0[idxs]
Y = fac2[idxs]
plt.figure(figsize=(12,12))
plt.scatter(X, Y)
for i, x, y in zip(top_movies[idxs], X, Y):
plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

wdsandall · June 20, 2024, 10:49pm

I had the same issue when trying to produce the PCA Matrix but for the fastai.collab model rather than the “create your own” DotProductBias based module.

That class has the attribute “movie_factors” which our fastai.collab learn object doesn’t have:

self.movie_factors = create_params([n_movies, n_factors])

I found a clue to the solution a little further down in the Embedding Distance section of the 2022 course (Lecture 8), which shows that our movie_factors is just:

movie_factors = learn.model.i_weight.weight

When I replaced the line that was causing an error with this it worked:

movie_w = learn.model.i_weight.weight[top_idxs].cpu().detach()

(So in OP’s attempted fix, it was just missing the .weight)