Hi! I have a hopefully quick question:

Problem setup: I need to take a tensor with shape BS x 1 x Features (where BS is batch size) and put it as an input to a LSTM with TS timestamps. So transform it to BS x TS x Feat. However, I need the gradients to be backpropagated! Afaik I need sth similar to TimeDistributed from keras.

What I tried is to use tensor.expand(): Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor ...
With expand(), the model learns, but well, maybe because the data is fairly easy.

Does expand() also broadcasts my gradients? Sums them up for each TS? Is there a better approach? The repeat() method copies the data, according to the doc. The model works with repeat too, so, I can’t really tell what’s going on.

Also, because it is never that easy. I also add a feature vector of size BS x TS x 1 at the end of the initial feature vector, for each TS.

Thank you!

p.s. What I am trying to do is a simple encoder/decoder architecture for time series. No attention, but I do have some additional info available about the “future” that I want to inject.

Hi @visoft,
I’m sorry but I’ve been unable to understand exactly what you want.
You say you have a tensor of shape: bs, 1, features. But is this a sequence?
In time series/ sequence problems you usually have an input shape = (batch size, features, seq_len), where features may be 1 for univariate TS or >1 for multivariate TS, and seq_len is the lenght of the time series (steps).
So based on what you say, it seems you only have a single time step. I think I’m missing something :thinking:

LE: Answer:
Well, I found the answer on pytorch forum. Basically both repeat() and expand() behave the same wrt to backpropagation:

Original question:

I am trying to reimplement these: https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/

The chapter: Encoder-Decoder LSTM Model With [Univariate] Input. Key reading for my problem “First, the internal representation of the input sequence is repeated multiple times, once for each time step in the output sequence. This sequence of vectors will be presented to the LSTM decoder.”

The output of the encoder is bs x 1 x feature (basically the last output of LSTM) This is then “broadcasted” time-wise to the length of the desired output (and fed to decoder LSTM)

And they do in Keras (comments are mine):

# Encoder, take only the last output
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features))) 
# Does it "broadcasts" gradients too? Probably! In Keras
# The decoder, with n_outputs time steps
model.add(LSTM(200, activation='relu', return_sequences=True))

What I do:

   self.lstm_enc = nn.LSTM(input_size=in_features, hidden_size=hidden_size, batch_first=True)
   self.lstm_dec = nn.LSTM(input_size=self.hidden_size, hidden_size=hidden_size, batch_first=True)

   #bs = batch size
   lstm_enc_out,_ = self.lstm_enc(bn) # bn tensor have BS x some time length x some features
   last_lstm_output = lstm_enc_out[:,-1,:]
   last_lstm_output = last_lstm_output.view(bs, 1, -1) # BS x one timestamp x some features
   last_lstm_rep = last_lstm_output.expand(bs, self.future_steps, self.hidden_size) # !!!!!
   decoded_lstm,_ = self.lstm_dec(last_lstm_rep)

Now, the model I am implementing is just an exercise, but the gradient issue is bugging me!
L.E: In original post, it is not about Keras’ TimeDistributed but RepeatVector. My bad.

I have been playing lately with image sequences, trying to predict the iiradiance from the sun.
It is pretty hard to train these models, and have been wondering if the AWD_LSTM may work for this type of continuous task.
My problem is to predict the future steps of a continuous time series that has one image, and some quantities for the past values.
I am using N images from the past (20 for this experiment) and I try to predict the next 15 minutes.
So in blue, is the x, and in red the y.
The images that are plotted, represent the 1st, the middle and last image, just to get an idea if it is cloudy or not.
As you can imagine, the trick is to predict if the clouds will cover the sun in the next frames.

My base model, is very similar to the one on the repo I posted some days ago with and encoder based on the dilated resnet from @muellerzr and some LSTM layers to capture the temporal relation.
I have a lot of data, like millions of images, from different sites in very high quality. Currently trying to work with 400 days of data (1 image/minute) to test and find a suitable model.
The papers on this topic are pretty useless, with very blurry results, and with codes that are far from best practices. I have implemented a handful of them, with results that are weak.
Any ideas?

1 Like

I don’t get your problem. You just need to format your data on the correct input shape, make the batches be:

[bs, seq_len, features]

and the nn.LSTM(features, hidden_dim) will return a tensor of shape: [bs, seq_len, hidden_dim] no need to repeat or expand.

So your model would be:

class Model(Module):
  def __init__(self, input_size, hidden_dim):
     self.lstm_enc = nn.LSTM(in_features, hidden_size, batch_first=True)
     self.lstm_dec = nn.LSTM(hidden_size, hidden_size, batch_first=True)
  def forward(self, x):
    x = self.lstm_enc(x)
    return self.lstm_dec(x)


model = Model(16, 32)
model(torch.rand(8, 10, 16)).shape
1 Like

Hi, that’s a really interesting problem! I am also starting to work in a project about deep learning and space weather. Currently we are using N-beats as an architecture for the forecasting of the solar flux (for now just univariate forecasting).

Unfortunately I cannot help you because I think you are making kind of a multimodal learner with both images and time series as input, is that correct? However, I would like to know more about the sources of data that you are using, in case they are publicly available. Is there anywhere that I can see more of this work?

Thank you so much!

Thank you for the answer. What is not said is that I want to decouple encoder from decoder because input and output have different length (not a problem per se) and I have some prior info about the future that I want to append to the output of the encoder. Otherwise, yes, lstm_dec(lstm_enc(input)) would do the trick just fine. Also, encoder will be a CNN or other fancy architectures.

take a look at this repo it uses a CNN to encode the input, and an lstm layer to find the temporal relation.
My problem posted above about the sun forecasting does exactly this, concatenating some prior info about the sun to the lstm layer, after the encoding
I you need more info, or want to work together, I am very much interested on this type of encoder/stm/decoder problem right now.
I have a basic LSTM wrapper to take care of the hidden state and dropout:

class LSTM(Module):
    def __init__(self, input_dim, n_hidden, n_layers, bidirectional=False, p=0.2):
        self.rnn = WeightDropout(nn.LSTM(input_dim, n_hidden, n_layers, batch_first=True, bidirectional=bidirectional), p)
        self.h = None

    def reset(self):
        self.h = None

    def forward(self, x):
        raw, h = self.rnn(x, self.h)
        self.h = [h_.detach() for h_ in h]
        return raw, h

then use a learner with the Reset Callback. The main model is like this:

class BasicModel(Module):
  def __init__(self, cnn_encoder, n_features, n_hidden, n_lstm_layers):
    self.encoder = cnn_encoder
    self.n_hidden = n_hidden
    self.lstm = LSTM(512, n_hidden, n_lstm_layers, batch_first=True)
    self.head = nn.Linear(n_hidden+n_features, 1)

  def forward(self, x, features):
    "x are the images, features are the timeseries"
    x = torch.stack(x, dim=1)  #stack images together, to form a sequence o images (bs, seq_len, 3, h, w)
    x = self.encoder(x)  #rencode images with a resnet, cut after pool. returns a (bs, seq_len, 512) tensor
    x, _ = self.lstm(torch.cat(x, features.permute(0,2,1), dim=-1))
    return self.head(x)

  def reset(self): self.lstm.reset()

there are some reshapes missing, but you get the idea.


There is a lot of data available:

  • The Swimcat datasets containt various types of sky images, for classification and segmentation task
  • The WSISEG dataset (we have been using this one) with whole sky segmentation masks.
  • NREL has a lot of public data from yhear 2010 accompanied with weather stations mesures.

Thank you! It is a really interesting approach. There are some things that are not familiar to me yet (I’m still a noob :)), specially the cnn_encoder stuff.

I had a look at you action recognition repo. One question: How do you pass from a problem with one single output (the class of the action) to a problem with a forecast of 10-20 future points in a time series?

We can make a call with Ignacio to discuss the details, it is very straightforward.

  • The ccn_encoder is just a resnet created with create_model
  • The time series output is free from the lstm. It will output the same number of outputs as inputs.

Hi everyone!

I have been wondering if there is a way to use transfer learning for time series data. Similar to pre-training a language model and fine-tuning a text classifier, couldn’t we use iterative time series forecasting as a self-supervised pre-training method before fine-tuning e.g. a classifier on a fixed window length? This could be useful if large amounts of unlabelled time series data are available, but labelled examples are scarce.

Does anyone have thoughts on this or has maybe even tried it out?

1 Like

Hi @stefan-ai,

I think this is definetely a great idea, and worth exploring :grinning:.
I’d be very interested to learn if you make any progress in this area.

Personally I haven’t used any self-supervised time series implementation yet, but I think the approach should work.

A new paper published this month addresses this topic:
Jawed, S., Grabocka, J., & Schmidt-Thieme, L. (2020, May). Self-supervised Learning for Semi-supervised Time Series Classification. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 499-511). Springer, Cham.
It’s freely available online.
It’d be great to see a fastai implementation of this approach!

PS: @Epoching wrote an excellent blog post on how to apply this type training to images using fastai.
Another great blog post that describes SSL for images using fastai has been written by @JoshVarty.
Both be useful if you or someone else pursue this SSL approach.


Thanks a lot @oguiza for pointing me to these resources. If I get anywhere I will share my results here for sure.

I have been mostly thinking about an RNN based approach similar to ULMFit. But if I remember correctly, I read in one of your earlier posts that turning time series into images and using CNNs has worked better for you, right?

1 Like

I’m definitely interested in how this turns out! Keep us updated :slight_smile:.

@hfawaz has also an excellent paper about transfer learning for time series classification. If I remember correctly, the conclusions highlighted the importance of the similarity between datasets when transferring knowledge from one to another.

I wonder what would be the intuition of a model trained with massive amounts of time series data in a self-supervised fashion. In the same way that the first layers of a pre-trained model for images recognize basic components of an image, such as corners or gradients, would there be an equivalent to think of for time series? Unlike images and text, two time series from different domains do not necessarily need to share anything apart from having a time axis.


Thanks for the info! That is what I did, +/- some variants. However, I moved a bit away from LSTM for now. A bunch of CNNs (timewise) + some linear layers on top, work better. I get the feeling that there is more to LSTM so later I will try to do some attention like mechanism where to inject the prior future. Also, there is Oguiza’s ResNet and Inception adaptations to time series. Hmm, lots of directions and so little time . . .

In relation to the talk about transfer learning and time series, I just saw this paper accepted on KDD 2020: Multi-Source Deep Domain Adaptation with Weak Supervision for Time-Series Sensor Data

Paper: https://arxiv.org/abs/2005.10996
Code: https://github.com/floft/codats


Yeah, that’s really an intriguing idea. It’d be great to have model trained across multiple TS datasets, that can be then fine-tuned!

1 Like

Thanks for your reply @vrodriguezf. I will check out that paper as well.

I mostly had the situation in mind where you have a lot of unlabelled time series data and some labelled examples from the same data source. I believe this could already be very helpful for many practical projects. The question is just how much unlabelled data you need to make this work and if the features learned from forecasting are useful for downstream classification tasks.

When it comes to a general time series model trained on large amounts of data from different domains, it’s also not clear to me what would be the underlying learned features and if they are useful for transferring to other domains. But for sure it would be very powerful if it works.