I am currently working on Kaggle Deepfake Detection. The first part of the pipeline I am designing is to read video frames and extract faces. I tried several vanilla approaches for reading video frames such as using open-cv with and without threading. Later, I found Decord in this kernel which reads directly to GPU (5x faster than CPU) but seems to have memory leaks. I tried using the great library from @stas ipyexperiments for overcoming the leak but the problem seems to still persist.
Also, there is this issues
I am wondering if there are any suggestions on how to make it work.
Here is my kernel for reference. Ideally at test time we will process 4000 videos without any leak.
It’s a kernel only competition so 3x-5x gain is significant
Also looking into Nvidia DALI
DALI is more robust - not having any leaks but I am probably not able to use it in the most optimized way. Also there are some minor in detection.
Thanks for sharing! I’m working on that comp too atm and video loading has been a pain! Currently still using the cv2 approach you mentioned, might have a look at DALI now
I’ve been using the detector at https://github.com/timesler/facenet-pytorch which seems pretty quick, for sure compared to what I was using before. There’s a good notebook on its use at https://www.kaggle.com/timesler/guide-to-mtcnn-in-facenet-pytorch.
Late on the party.
I use BlazeFace. To save images, I used
torchvision.transforms.ToPILImage pipeline, and for inference I use
I saw somewhere a comparison of OpenCV, Dali and Decord, and the first was the most robust for errors.