Understanding of the language model

I was trying to understand the language model as explained in lesson 4.

What does the model actually try to do? Are we trying to predict the word-35 on the basis of word-12. Predict the word-7 on the basis of word-567. We do 64 of these and then
Predict word-227 on the basis of word-12 and word-35 and so on? In the process we learn the word embeddings. I don’t exactly understand the training process. Can someone explain? Also, can some tell me about the reference paper which explains this implemetation?