Hey folks,
I have been trying to use embeddings for the categorical features for the Blue Book of Bulldozers data as discussed in Lesson 6 of the course.
Here is the notebook:
The function to replace the categorical data with the embeddings is (copied from the forum)
def add_embeds(learn, x):
x = x.copy()
for i, cat in enumerate(cat_nn):
emb = learn.embeds[i]
vec = tensor(x[cat], dtype=torch.int64) # this is on cpu
emb_data = emb(vec)
emb_names = [f'{cat}_{j}' for j in range(emb_data.shape[1])]
emb_df = pd.DataFrame(emb_data, index=x.index, columns=emb_names)
x = x.drop(columns=cat)
x = x.join(emb_df)
return x
and then I call the function using the learner and the dataset.
xs_emb = add_embeds(learn, df_nn_final)
I keep getting Index out of bounds error IndexError: index out of range in self
emb_data = emb(vec)
This line throws the error.
vec is a tensor, vec = tensor(x[cat], dtype=torch.int64)
of length of the dataset’s dimension torch.Size([412698])
while the first embedding’s dimension are torch.Size([73, 18])
Can anyone take a look, please?