Need help on identifying the proper operations for 2 sequential tensor

Hi,

I have a tensor X , which is of shape:

batch_size X number_of_sequence X hidden_size

ex: 10 X 253 X 768 … This tensor X is from the output of an LSTM.

There is a another tensor Y, of shape:

batch_size X number_of_sequence X embedding_size

ex: 10 X 253 X 300 … This tensor Y is the output from an Embedding.

I need to work with these 2 tensors X and Y and feed this to an Attention network to match the sequence Y to each element in X. I need bit help as to which operations would be better to pack X and Y … I mean, if I do torch.cat((X, Y), dim=2) will this be a good idea?