for Video Understanding?

Hi everyone,

I read the Image Sequence tutorial in the docs and thought it was overall a decent introduction to understanding videos with However, I found there to be many limitations, particularly with data augmentations (video cropping/sampling frames/etc.) and with the available pre-trained models (none for video). I tried looking through what people have done so far, but it seems like does not currently have much support for video understanding.

Personally, I loved using the library in the past, so I am thinking about forking my own version that will contain pre-trained video models along with video-specific augmentations largely based off the pytorchvideo library. Before I embark on this, has there been other people who have already started this? Ideally, I wouldn’t want to recreate something that someone else has already done. If not, does anyone have any tips for how I should structure my project/repo? I would be happy if other people can use what I’ve created :smile: