I was going through the video for the language model matrix in lesson 4 of part 1 of the course. I understood what the rows and columns represent and that PyTorch randomly shuffles the breakpoints to inject randomness. What I didn’t understand was - since the next matrix is just one word shifted from the previous one, where does the concept of random-sized matrices (using values close to bptt) come into picture here?

