Hi,

I have a tensor **X** , which is of shape:

`batch_size X number_of_sequence X hidden_size`

ex: 10 X 253 X 768 … This tensor **X** is from the output of an LSTM.

There is a another tensor **Y**, of shape:

`batch_size X number_of_sequence X embedding_size`

ex: 10 X 253 X 300 … This tensor **Y** is the output from an Embedding.

I need to work with these 2 tensors **X** and **Y** and feed this to an Attention network to match the sequence **Y** to each element in **X**. I need bit help as to which operations would be better to pack **X** and **Y** … I mean, if I do `torch.cat((X, Y), dim=2)`

will this be a good idea?