Setting n_factors Collaborative filtering?

Hello all,
I would like to ask you about n_factors when I build collaborative filtering model.
Can I choose random number for n_factors? In the lesson, he used 5 or 50 n_factors.
Also, would you explain what n_factors’ role is?

Thank you so much for your time!

n_factors is just the size of the embeddings for each element. In the tutorial images, there are say x rows and each row has 5 features. These features are n_factors. They are what determine the features of a movie.
The more the features, the more accurate your model, but also harder to train. you should experiment with different n_factors and see what results you get.

Thank you so much for your reply!

I couldn’t fully understand tbh by what you said. Does that mean that I would need more data to train if I would want to increase the n_factors?

Thanks in advance.

The ideal number of embeddings depends on the complexity of the data. Using too many can lead to overfitting, while using too few may not be enough to capture all the patterns and features in the underlying data. So using more embeddings is not always better.

The difficulty in choosing the right number of embeddings is therefore related to over- / under fitting the model to your data, which is different in every case, as every scenario has different data, so it has to be experimented with.

1 Like