I was reading the code that defines the class
PoolingLinearClassifier (link) and I just wanted to confirm my understanding of how they work. To illustrate, this is what I think happens:
Imagine our input sentences are all 12 words long and our batch size is 5, so the block is shaped (12, 5). Then assume we define our
bptt as 4, so the first step is to divide each sentence into 3 parts, each is 4 words long, colored in blue, yellow, and red.
Then, each colored block is fed into
RNN_Encoder one by one so we end up with a list of 3 outputs. Now we concatenate them in such a way that the first rows from each output are regrouped together into a list, the second rows together, and so on, so we end up with a list of 4 outputs, each is shaped (3, 5), like the middle block above.
Moving on to
PoolingLinearClassifier, it keeps the last output only (i.e., 4, 8, 12 above) and applies average and max poolings to it. Last, it combines them with the last row (12) into one long vector, and feed it into the linear layers (i.e.,
torch.cat([output[-1], mxpool, avgpool], 1)).
Phew… Am I right in my interpretation?