I am currently working on Kaggle Deepfake Detection. The first part of the pipeline I am designing is to read video frames and extract faces. I tried several vanilla approaches for reading video frames such as using open-cv with and without threading. Later, I found Decord in this kernel which reads directly to GPU (5x faster than CPU) but seems to have memory leaks. I tried using the great library from @stasipyexperiments for overcoming the leak but the problem seems to still persist.
Thanks for sharing! I’m working on that comp too atm and video loading has been a pain! Currently still using the cv2 approach you mentioned, might have a look at DALI now
Late on the party.
I use BlazeFace. To save images, I used torchvision.transforms.ToPILImage pipeline, and for inference I use toTensor.
I saw somewhere a comparison of OpenCV, Dali and Decord, and the first was the most robust for errors.