The "YOLO" of video to detect patterns/actions

Hi FastAI community,

YOLO is a model that detects objects in images and creates bounding frames of theirs, it could be used while processing a video, making it efficient for video purposes.

An “action”/“pattern” is created by capturing a sequence of frames, creating a context together.

Is there a “YOLO” model for videos, that gets video snippets (3 to 90 frames) and classifies the actions observed in these video snippets?

This way, the model will get a video and label the actions in the video every sequence of frames.

For instance, the model will get a video snippet of a man riding a bicycle and turning right, and it will label it as “turning right”.