Hi,
I have a tensor X , which is of shape:
batch_size X number_of_sequence X hidden_size
ex: 10 X 253 X 768 … This tensor X is from the output of an LSTM.
There is a another tensor Y, of shape:
batch_size X number_of_sequence X embedding_size
ex: 10 X 253 X 300 … This tensor Y is the output from an Embedding.
I need to work with these 2 tensors X and Y and feed this to an Attention network to match the sequence Y to each element in X. I need bit help as to which operations would be better to pack X and Y … I mean, if I do torch.cat((X, Y), dim=2)
will this be a good idea?