Video with fastai, and image + text

Hi Everyone,

could anyone direct me to resources for using fastai with video captioning? If you have image captioning work, then that would be great too. My interest is combining text inputs with image/video inputs. What is the standard pipeline for that @muellerzr

A quick and easy way (and the right approach) is described here: Combining Tabular + Images in fastai2 (and should work with almost any other type)

Basically you make two DataLoaders, one for the text and one for the image then combine them via my MixedDL. It’s by far the easiest approach IMO rather than fighting the API

1 Like

Oohh. Thanks. I’ll take a look. Just a quick question: would you have one of your .ipynb files for this? I’m looking through your Practical Computer Vision 2.0 github but I’m not finding it. Maybe I’m not looking in the correct place?

Just the post for right now. I made a Kaggle Kernel (in that post) but in the next week or two I’ll have one under the fastinference section of that repo