Hi! I have a hopefully quick question:
Problem setup: I need to take a tensor with shape BS x 1 x Features (where BS is batch size) and put it as an input to a LSTM with TS timestamps. So transform it to BS x TS x Feat. However, I need the gradients to be backpropagated! Afaik I need sth similar to
TimeDistributed from keras.
What I tried is to use tensor.expand():
Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor ...
expand(), the model learns, but well, maybe because the data is fairly easy.
expand() also broadcasts my gradients? Sums them up for each TS? Is there a better approach? The
repeat() method copies the data, according to the doc. The model works with
repeat too, so, I can’t really tell what’s going on.
Also, because it is never that easy. I also add a feature vector of size BS x TS x 1 at the end of the initial feature vector, for each TS.
p.s. What I am trying to do is a simple encoder/decoder architecture for time series. No attention, but I do have some additional info available about the “future” that I want to inject.