With a batch size of 64, the first row of the tensor representing the first batch contains the first 64 tokens for the dataset. What does the second row of that tensor contain? What does the first row of the second batch contain? (Careful—students often get this one wrong! Be sure to check your answer against the book website.)

I have been unable to figure out what this question is attempting to ask. I don’t even think batch size is related to the number of tokens in the first row, instead this is seq_len? I had no idea what this one was attempting to ask.

My thinking was that the batch size of 64 is unrelated to the fact that the first row contains the first 64 tokens for the dataset (so this means that 64 is the sequence length)-- my mental picture is that the batch size is the number of rows but the sequence length is the number of columns – so the 2nd row of the first batch would contain the next 64 (seq_len) tokens after n * seq_len where n is the number of batches.

t = len(tokens) # t is total number of tokens
n = t // (bs * seq_len) # n is no of batches, bs is batch size, seq_len is sequence length)

so every 1st row on the all batches would contain the tokens from first n * seq_len tokens and
the 2nd row on all the next batches would contain the next n * seq_len tokens after the first and
so on down the line…

My answer to the question would now be the 2nd row of the 1st batch contains the next seq_len tokens after the first n * seq_len tokens, ie the next 64 tokens after n * 64 tokens but since
n is not given (as the total number of tokens is not given), I am assuming that there is some number n batches as part of the answer.