Data preparation for language model

(Nate) #1

I’m trying to understand how the data preparation for a language model works. The get_batch function in LanguageModelLoader returns[i:i+seq_len],[i+1:i+1+seq_len].contiguous().view(-1)

I know the purpose of the model is to predict the next word given the preceding sequence, so I was expecting a sample to be a sequence and the label to be the word following that sequence, say data[0:50] and data[50]. However, it seems that the sample and label have the same length, just shifted over 1, so something like data[0:50] and data[1:51]. I can’t quite wrap my mind around how this is working.


(Michael) #2

The language model tries to predict the next word after each word in the sequence.

Here you find a nice illustration of the language model setup on the left side of the figure:

1 Like