I am reading the TagSpace paper and I am having some difficulty in understanding `Section 3 - Convolutional Embedding Model`

and also the Figure provided for it.

A convolution layer is then applied to the

`l × d`

input matrix, which considers all successive windows of text of size`K`

, sliding over the document from position`1 to l`

. This requires a further`Kd × H`

weights and`H`

biases to be learned. To account for words at the two boundaries of the document we also apply a special padding vector at both ends. In our experiments`K`

was set to 5 and`H`

was set to`1000`

. After the convolutional step, a`tanh`

nonlinearity followed by a max operation over the`l × H`

features extracts a fixed- size (`H`

-dimensional) global feature vector, which is independent of document size. Finally, another tanh non-linearity followed by a fully connected linear layer of size`H × d`

is applied to represent the entire document in the original embedding space of`d`

-dimensions.

I don’t understand what ** H** stands for and how in the figure a matrix of dimensions

`(l + K - 1) x d`

was transformed to a matrix of `l x H`

. As it is a convolution operation, I was expecting an output matrix of dimensions `l x d`

.