Time Series Prediction

Hi all,
I have a multivariate time series data, where all the features are correlated to some extent. The objective is to create an embedding using RNNs for each feature such that, the embedding holds the information about the history of that feature and how it is correlated with all other features.
For this I thought of an Encoder-Decoder seq2seq prediction architecture. Here the encoder has two RNN cells that learn in parallel, one that looks at time series of all other data till time point t and the second one looks only at one feature, for which I want the embedding, till time point t. Then the outputs of the two RNNs are joined using a linear layer.
The Decoder uses this embedding to predict the only feature read by the second RNN in the encoder at t+1. I’m hoping that the embedding will capture the non linear relationship between all the features this way.
Then the feature that is input to the second RNN will be swapped with another feature from the input to the first RNN in the encoder to learn the embedding for this other feature.
Please let me know if you think this is not theoretically sound or where I might run into problems.