Language Model- Understanding Predictions

Hi ,

This may be a very basic question but I am a bit confused about it.

So, when we create a LM we provide some input representation (of 3 chars or 8 chars etc) and then the model predicts the next character (or next word) and we compare this predicted char /word with actual and calculate Loss and then do BPTT to change gradients.

Now my question is…how exactly the model comes up with a prediction based on the input? and how this prediction changes after optimisation (i.e how exactly is the model learning)

I understand its a conditional probability based on input. but can anyone help with an example or point to a resource?


I have a follow-up question for this topic. Why is it that if I repeat a prediction with the same input, I get different outputs?! For example:
learn.predict("flarple", n_words=2) yields ’'flarple and' and then running that same cell again yields, flarple stopped three'. Any clues on why we would get different outputs?