Is there an issue where the bias components are overwhelmingly determined by the “non-experts” in a genre?
For instance, you used two different phrasings for high and low biases:
If the bias is low, you state that “even if you like this type of film,” you wouldn’t like this one
If the bias is high, the opposite “even if you don’t like this type of film,” you would like this one
Is seems like the population in statement 2 is going to be much larger, and then would have a much larger impact? I can’t put my finger on it, but it feels vaguely related to fast-food chains getting 5 out of 5 stars on Yelp… even though a burger expert might have a pointedly different opinion?
Is collaborative filtering related in any way to correspondence analysis (which also represents both the rows and columns of a matrix in the same embedding space)?
If I recall correctly, PCA is just a single layer autoencoder with a linear activation function (so technically not a neural network since the activation function is not non-linear)…
Basically, autoencoders in general are a generalization of PCA, where both learn some function (PCAs only able to do a linear function) to map a high dimensional dataset to a low dimensional embedding space…
I don’t really get why we need for the word embedding 4 columns (latent variables?) in the matrix, when we have one dimensional unique identifiers, the indices for each of the words in the vocabulary. I have a gap in my mental picture.
Well the embeddings are usually learned such that there is actual semantic meaning behind the embeddings.
For example, the famous “word2vec” embeddings are known to have some interesting properties where for example, you could do math with the embeddings for the words:
“king - man + woman”
and get an embedding close to the one for “queen”
You can imagine that having that semantic information would be much more helpful to a neural network model than the unique indices…
Can anyone recommend or share a notebook that demonstrates a good implementation of word embeddings combined in a two step process with a boosted tree or random forest as described using the fastai library?