Lesson 8 official topic

jeremy · June 29, 2022, 10:23pm

This post is for topics related to lesson 8 of the course. This lesson is based partly on chapter 8 of the book.

This is a wiki post - feel free to edit to add links from the lesson or other useful info.

<<< Lesson 7

Lesson resources

Edited recording
Please add any questions you want Jeremy to answer to the AMA thread – and upvote any there you’re interested in
Special extra: Data ethics lesson
Notebooks for this lesson:
- Collaborative Filtering Deep Dive
Spreadsheets for this lesson:
- Collaborative filterings and embeddings
- Convolutions
Solutions to chapter 8 questionnaire from the book.

arunslb123 · July 5, 2022, 8:03am

Stream looks good now

kurianbenoy · July 5, 2022, 8:31am

Is PCA useful in any other areas like visualizing this way if you have some domain knowledge?

ste · July 5, 2022, 8:31am

This is a great post on visualising high dimensional data with sample python code of PCA and TSNE.

ste · July 5, 2022, 8:34am

Another interesting non linear technique is “UMAP” that kind of use a train-inference approach (AKA you can apply the model to unseen data).

ste · July 5, 2022, 8:36am

…You usually use your “domain knowledge” to interpret (aka give a meaning) to the resulting reduced dimensions.

jona · July 5, 2022, 8:38am

Is there an issue where the bias components are overwhelmingly determined by the “non-experts” in a genre?
For instance, you used two different phrasings for high and low biases:

If the bias is low, you state that “even if you like this type of film,” you wouldn’t like this one
If the bias is high, the opposite “even if you don’t like this type of film,” you would like this one

Is seems like the population in statement 2 is going to be much larger, and then would have a much larger impact? I can’t put my finger on it, but it feels vaguely related to fast-food chains getting 5 out of 5 stars on Yelp… even though a burger expert might have a pointedly different opinion?

skalyan-anu · July 5, 2022, 8:40am

Is collaborative filtering related in any way to correspondence analysis (which also represents both the rows and columns of a matrix in the same embedding space)?

ilovescience · July 5, 2022, 8:40am

On a separate note…

If I recall correctly, PCA is just a single layer autoencoder with a linear activation function (so technically not a neural network since the activation function is not non-linear)…

Basically, autoencoders in general are a generalization of PCA, where both learn some function (PCAs only able to do a linear function) to map a high dimensional dataset to a low dimensional embedding space…

Zakia · July 5, 2022, 8:53am

So, word embeddings can be used for both categories in tabular data and NLP data.

How do we come up with the size of the embedding, like in this case it is 5? And the different values for each.

jmp · July 5, 2022, 8:54am

I don’t really get why we need for the word embedding 4 columns (latent variables?) in the matrix, when we have one dimensional unique identifiers, the indices for each of the words in the vocabulary. I have a gap in my mental picture.

ilovescience · July 5, 2022, 8:56am

Well the embeddings are usually learned such that there is actual semantic meaning behind the embeddings.

For example, the famous “word2vec” embeddings are known to have some interesting properties where for example, you could do math with the embeddings for the words:
“king - man + woman”
and get an embedding close to the one for “queen”

You can imagine that having that semantic information would be much more helpful to a neural network model than the unique indices…

ste · July 5, 2022, 8:56am

Interesting point of view on PCA!
I was always thinking about it as SVD / pseudo-inverse.

SVD: Singular Value Decomposition (SVD): Overview - YouTube

arunslb123 · July 5, 2022, 8:58am

Does using embeddings trained from another model (neural network), and using it in a random forest cause data leakage?

ste · July 5, 2022, 9:01am

Shouldn’t if you’re using the same train/validation split.

Mattr · July 5, 2022, 9:04am

Can anyone recommend or share a notebook that demonstrates a good implementation of word embeddings combined in a two step process with a boosted tree or random forest as described using the fastai library?

ilovescience · July 5, 2022, 9:35am

BTW I have found this resource helpful for visualizing convolutions:

(I think a similar diagram might be in the fastai book)…

jona · July 5, 2022, 9:39am

Is there typically a different dropout mask for EACH activation layer, or only near the end of your convolution stack?

If applied repeatedly, it seems like even a small dropout % could drastically reduce the ability for learning to occur?

ilovescience · July 5, 2022, 10:03am

End of the course
Thanks for another great course, especially with a surprise Lesson 8

Looking forward to Part 2!