Lesson 4 official topic

radikubwa · May 17, 2022, 9:00am

Check at @ste comment.

jwuq · May 17, 2022, 9:01am

It’s not only the token but the context - in this case the _an that is the start of ‘antinus’ is in a different context to the article _an followed by another word starting token (beginning with underscore)

ste · May 17, 2022, 9:01am

You actually miss the column context. Check your code and be sure you’ve populated the df properly.

JackV · May 17, 2022, 9:02am

Hi ste,

Is this right? df[‘input’] = 'TEXT1: ’ + df[‘context’] + '; TEXT2: ’ + df.target + '; ANC1: ’ + df.anchor

I’m getting this error:

SamFogarty · May 17, 2022, 9:03am

Maybe I’m blind but ‘_an’ is not part of ‘ornithorhynchus’, so I’m struggling to understand the question.

But generically, the model is not looking at a token in ‘isolation’ but rather in the context of the other tokens. The meaning of a sentence can dramatically change by even the final word in a sentence. The transformer model learns how to understand the token by its relationships to all the other tokens in the input.

Zakia · May 17, 2022, 9:03am

Run the notebook sequentially, from top to bottom up to the point you’re at…

ste · May 17, 2022, 9:04am

jwuq · May 17, 2022, 9:04am

That took me a moment too - it’s part of antinus!

Moody · May 17, 2022, 9:04am

Could you comment on double decent in the context of the polynomial example of overfitting

ilovescience · May 17, 2022, 9:09am

I think it’s an area of active research actually, it’s a little weird lol

devforfu · May 17, 2022, 9:12am

I think it was even about triple descent in some cases… Maybe it is a sort of general phenomenon, i.e., N-descent?

madhavajay · May 17, 2022, 9:14am

Question: Has anyone discovered any interesting insights about intermediate layers of NNs for NLP tasks? Perhaps like a WORDLE style heatmap across word probabilities or some kind of etymological inheritance / relationship?

wgpubs · May 17, 2022, 9:14am

In transformers, a process known as self-attention is applied to each token so that its numerical representation includes information about other tokens in the sequence. In this way, transformers learn how to incorporate context so that “I ate a burger” and “I ate it big time when I got on the dirt bike” will result in different representations for “I ate” in each of those sequences.

The best resource imo for understanding this, and the transformer arch. in general, is here: The Illustrated Transformer

JackV · May 17, 2022, 9:14am

I re-ran the notebook and still got the same error. I’m not sure what I’m doing wrong…

n-e-w · May 17, 2022, 9:17am

Brilliant resource – I’m going to devour that! Thanks for posting.

ilovescience · May 17, 2022, 9:18am

You downloaded the wrong file (I think you actually have Jeremy’s submission file for the competition from his notebook).

ste · May 17, 2022, 9:18am

weird!: Please type in a new cell:

!cat {path/'train.csv'} | head

…wanna be sure you’re loading the right file.

ilovescience · May 17, 2022, 9:19am

See this is the file you got:

You want the train.csv file from the Kaggle competition:

wgpubs · May 17, 2022, 9:20am

… best explanation of what self-attention is, and why its the heart of understanding the transformer.

n-e-w · May 17, 2022, 9:21am

To gild @wgpubs ’ lilly, another fantastic beginner resource to start understanding Transformers under the hood is this post / video by Dale Markowitz over at Google:

and

I can thoroughly recommend all of Dale’s work – she’s great at getting folks started with real-world projects in a tractable way: