I was reading the code that defines the class MultiBatchRNN and PoolingLinearClassifier (link) and I just wanted to confirm my understanding of how they work. To illustrate, this is what I think happens:

Imagine our input sentences are all 12 words long and our batch size is 5, so the block is shaped (12, 5). Then assume we define our bptt as 4, so the first step is to divide each sentence into 3 parts, each is 4 words long, colored in blue, yellow, and red.

Then, each colored block is fed into RNN_Encoder one by one so we end up with a list of 3 outputs. Now we concatenate them in such a way that the first rows from each output are regrouped together into a list, the second rows together, and so on, so we end up with a list of 4 outputs, each is shaped (3, 5), like the middle block above.

Moving on to PoolingLinearClassifier, it keeps the last output only (i.e., 4, 8, 12 above) and applies average and max poolings to it. Last, it combines them with the last row (12) into one long vector, and feed it into the linear layers (i.e., torch.cat([output[-1], mxpool, avgpool], 1)).

If you take a look at the final shape of your Tensor it should be something like [bs,1200].
That 1200 comes from 400 + 400 + 400 that are the the concatenation of [last(lastOut), maxPool(lastOut), avgPool(lastOut)].
WHERE: lastOut is the last output ([4,8,12] in your explanation).

NOTE: 400 is the output size of the last LSTM layer in the EncoderâŚ

AFAIK Regrouping is a kind of âactivations engineeringâ (feature engineering in the domain of activations - something similar to DenseNet) to take into account information from all the important parts of the lastOutput and not only the very last Tensor.