Efficient MP4 to Images for training

(Jay Franck) #1

Hi folks,

I am working on a classification project with a large number (1000s) of short videos. These have a footprint of ~20 GB of data. I have been trying to find an efficient way of processing the videos to be used with Fast.AI’s DataSet loader. My basic idea:

  1. Load each video with OpenCV
  2. Grab a few frames per second
  3. Either save as JPEG (StackOverflow1 or convert directly to tensors (StackOverflow2) and save to disk
  4. Train classifier on these transformed images.

For an 11 MB video, I got roughly 2 GB when trying to save as a numpy tensor (‘uint8’) or as JPEGs.

This seems like a really inefficient process to me. Any guidance on how I might better approach this project? Thanks!


(Konwn Ad) #2

How much did you get when saving as JPEGs? It should be smaller than saving as numpy arrays.

My suggestion would be to actually find a way to keep them compressed as videos, since that’s the best compression you can get. Without it, you’d lose the temporal encoding/compression.

You could write a DataLoader that keeps somewhere the file path to each video and their amount of frames (you can get that using ffprobe or OpenCV. Now, sampling from them depends on your application. If you’re doing video classification, you will probably sample videos from the video list, and from each video you extract a snippet, say 25 frames, making a tensor of [N, 25, 255, 255, 3]. Another alternative would be to have the total amount of frames and sample an integer that gives you the video + frame_id. However, I think the first option is better because it helps to select more diverse videos in case some have more frames than the others.

It’s a bit expensive to make random accesses in a video (they’re better sequential reads), but pre-fetching them with more workers will help with throughput.

1 Like

(Jay Franck) #3

Thank you for the great response @konwnad. Coding up my own dataloader sounds intimidating, but I will check out the source code to see if I can grok it.


(Matthijs) #4

I did some work with video recently and used ffmpeg to export the frames to JPEG. However, upon closer inspection, those images contained unwanted artifacts (even with minimal compression). Exporting as PNG solved this but it obviously also made the image files much larger.

So if you’re going the JPEG route, make sure the exported frames are good enough quality.

(Personally, I’d try writing a data loader that decompresses the movies on-the-fly, as suggested earlier.)

1 Like