Pretrained RNN architectures

We often use a pretrained resnet for our image classification tasks so that we can benefit from its low-level feature extraction (as well as its specific architecture which has been proven to be effective).

What pretrained RNNs are there for object classification in video streams? It seems like the lack of competitions for video analysis, and lack of datasets, reduces the number of models in this area but hopefully someone can connect me with a few good RNN architectures. (Yes I realize that I could use an CNN on each frame, but for my use case I think an RNN will offer an advantage)