Language model matrix composition in Lesson 4 part 1 (01:50:00)


I was going through the video for the language model matrix in lesson 4 of part 1 of the course. I understood what the rows and columns represent and that PyTorch randomly shuffles the breakpoints to inject randomness. What I didn’t understand was - since the next matrix is just one word shifted from the previous one, where does the concept of random-sized matrices (using values close to bptt) come into picture here?
If anyone can provide any inputs, it would be really helpful!

I think I got some idea and would like to clarify if it is correct. One batch of the network will contain say a 75x64 sized mini-batch along with its one word shifted matrix. The next batch will have a random size Sx64 where S would be approximately equal to 70 along with the one word shifted matrix. This will continue until we cover the 10 million rows in the input. Am I correct here?