I think you’ll have to elaborate a bit more about the state of your data.
To feed data into a LSTM you’ll have to get data into a time sequence. So for example, if I want to feed frames of a video to an LSTM I’ll have to pass all frames to a resnet and manipulate it into how an LSTM expects data.
The documentatiojn states:
input of shape (seq_len, batch, input_size):
Now, about Pre-trained.
The resnet pretrained model works well for images because (this is my understanding from Jeremy’s lectures) the initial layers know how to recognize lines and basic shapes well. The complexity increases as we go deeper. So will resnet work well for your data depends on the structure your data is encapsulating.
I think you’ll have to try it out!
I think you should separate out your data and create a small experiment where you don’t need the LSTM. Try out the resnet and see if it’s able to do the process your data well or not. (Sequence of length 1 kinda thing)