Lesson 4 official topic

Check at @ste comment.

1 Like

It’s not only the token but the context - in this case the _an that is the start of ‘antinus’ is in a different context to the article _an followed by another word starting token (beginning with underscore)

4 Likes

You actually miss the column context. Check your code and be sure you’ve populated the df properly.

3 Likes

Hi ste,

Is this right? df[‘input’] = 'TEXT1: ’ + df[‘context’] + '; TEXT2: ’ + df.target + '; ANC1: ’ + df.anchor

I’m getting this error:

Maybe I’m blind but ‘_an’ is not part of ‘ornithorhynchus’, so I’m struggling to understand the question.

But generically, the model is not looking at a token in ‘isolation’ but rather in the context of the other tokens. The meaning of a sentence can dramatically change by even the final word in a sentence. The transformer model learns how to understand the token by its relationships to all the other tokens in the input.

4 Likes

Run the notebook sequentially, from top to bottom up to the point you’re at…

3 Likes
1 Like

That took me a moment too - it’s part of antinus!

3 Likes

Could you comment on double decent in the context of the polynomial example of overfitting

3 Likes

I think it’s an area of active research actually, it’s a little weird lol

7 Likes

I think it was even about triple descent in some cases… Maybe it is a sort of general phenomenon, i.e., N-descent?

2 Likes

Question: Has anyone discovered any interesting insights about intermediate layers of NNs for NLP tasks? Perhaps like a WORDLE style heatmap across word probabilities or some kind of etymological inheritance / relationship?

2 Likes

In transformers, a process known as self-attention is applied to each token so that its numerical representation includes information about other tokens in the sequence. In this way, transformers learn how to incorporate context so that “I ate a burger” and “I ate it big time when I got on the dirt bike” will result in different representations for “I ate” in each of those sequences.

The best resource imo for understanding this, and the transformer arch. in general, is here: The Illustrated Transformer

15 Likes

I re-ran the notebook and still got the same error. I’m not sure what I’m doing wrong…

Brilliant resource – I’m going to devour that! Thanks for posting.

1 Like

You downloaded the wrong file (I think you actually have Jeremy’s submission file for the competition from his notebook).

2 Likes

weird!: Please type in a new cell:

!cat {path/'train.csv'} | head

…wanna be sure you’re loading the right file.

1 Like

See this is the file you got:

You want the train.csv file from the Kaggle competition:

3 Likes

… best explanation of what self-attention is, and why its the heart of understanding the transformer.

4 Likes

To gild @wgpubs ’ lilly, another fantastic beginner resource to start understanding Transformers under the hood is this post / video by Dale Markowitz over at Google:

and

I can thoroughly recommend all of Dale’s work – she’s great at getting folks started with real-world projects in a tractable way:

12 Likes