Can someone help me understand how does the training happen for this model? I understand how skip-gram or Glove works but this one is not clear to me?
Does the training happen something like:-
3701(0,0) is the input word and the next word which is 68(0,1) in the sequence is the output word. We train a network to do these predictions and in the process learn the embeddings? Am I getting it right or does it happen in some other way? Can someone please clarify?