Video processing and augmentation

Hi all!

I’ve been doing some video processing at work, last weeks. I tried fastai and I was impressed! One key element that I was looking for, was plenty of data augmentation routines. Imho what you have in vision.transform module is way above what tf/keras have!

I struggled, googled and read a lot of documentation before constructing my own vision.ImageDataset class. Not very elegant but it worked.

I saw today that in 1.0.19 a new parameter was added, image_opener! Nice! I put together a small tutorial on how to create a data.DatasetTfm that serves videos, without the need of implementing a separate class.

I created a colab notebook here:

Unfortunately the proposed method have some shortcomings, such as: 1) I couldn’t create a new Transform, 2) I failed to check if the 1.0.19 version of the Transforms can handle 4D (C,D,H,W) images, where D is the “time” dimension and C==3

I will be glad if smb can guide me with these issues.

Hope it helps and Thanks!


I guess that the fourth dimension is time ?
and that you would like to transform the video using the transform libray in fastai frame by frame ?
or did you hope to sort of stream the whole video through the transform without touching the dimensions ?

Yes! 4D tensor: (N,C,D,H,W), N is batch, C==3 (colors) and D is the “4th dimension”. Yeah, I should put some code snippets in the post, too.

My goal is to do the video augmentation nice and easy. That is, without writing too much “standalone” code and using fastai features as much as possible (eg enumerating through a dataset is done, no need to re-implement it)
Of course, doing it all in the same time using fastai means better running times (all operations are tensors which I suppose can be placed on cuda).

I am doing one video at a time. No frame by frame streaming.

so i see that you take out an image that has an exstra dim for the time. np.squeeze of a shape (1,200,200,3) has no cost because its just another view on the same data. the you would have to call

open_image in fastai returns a tensor
transform takes an tensor packed into fastai’s Image class like in the docs here:
search for get_ex

so i think you could make your own open_image. something like
step 1:

def nextImage():
im = capture next image like you have done in the notebook
return Image(pil2tensor( np.squeeze(im), np.float32 ).div(255) )

  • i do not know if the div(255) is required when you only work with coordinate transforms

step 2: make the transforms like in the docs: imT = apply_tfms( tfms, im)
step3: return the Image to a numpy using image2np(imT)*255.
step4: add the ekstra dimension (np.expand_dim) to get back to you 4 dim numpy image


Hm, maybe I should rephrase somehow my problem:
I need to get a 3D tensor (C,W,H) into the tfms pipeline [C >> 3]. This is done.
I need to get a 4D tensor (3, D, W, H) [where D = C / 3, C from the formula above] at the input of the 3D convolution layer. This is done (not shown in previous notebook)
Converting from 3D to 4D is also done.

I don’t know how to inject this 3D->4D operation into the existing fastai pipelines, so, I don’t have to touch the tensors myself.

So, the issue is not how-to-do the operations, but doing them inside the DataBunch, DatasetTfm.

Sorry for the confusion.

Hi @visoft

I am late to this party. Very curious, if you can manage to load videos, apply transforms without adding extra code?

Hi! I don’t work on this problem anymore. At the time, I needed code written. In the mean time, there are new features to fastai, like ItemList (esp the one in Vision trunk) where you can inject your custom file opener and custom collator. So less code to write (harder to understand/debug maybe) but far more integrated with the philosophy of fast dot ai.

1 Like