Spatio-Temporal Data->(Pretrained ConvNet + LSTM) or Pretrained ConvLSTM

In order to work on a Spatio-temporal data, are there pre-trained ConvLSTM models available?
Otherwise, does it make sense to use pre-trained ConvNet as feature extractor and LSTM for sequential processing?

Note: Spatial data is not necessarily a set of images, could be any grid of values of size mxn, eg 4x8.

I think you’ll have to elaborate a bit more about the state of your data.
To feed data into a LSTM you’ll have to get data into a time sequence. So for example, if I want to feed frames of a video to an LSTM I’ll have to pass all frames to a resnet and manipulate it into how an LSTM expects data.
The documentatiojn states:
input of shape (seq_len, batch, input_size):

Now, about Pre-trained.
The resnet pretrained model works well for images because (this is my understanding from Jeremy’s lectures) the initial layers know how to recognize lines and basic shapes well. The complexity increases as we go deeper. So will resnet work well for your data depends on the structure your data is encapsulating.

I think you’ll have to try it out!

I think you should separate out your data and create a small experiment where you don’t need the LSTM. Try out the resnet and see if it’s able to do the process your data well or not. (Sequence of length 1 kinda thing)

I am currently working on this, what did you do finally?

  • Didi you pretrain the encoder? if yes, did it worked?
  • How did you encoded the images to feed the LSTM?
  • Did you added extra data to the encoded images?
    I am currently working with sky images to forecast irradiance levels, so trying to “encode” cloud behavior with little success.