Image Captioning With FastAI

Hi everyone,
I would like to create an imaging caption model using fastai. I haven’t found too much in terms of resources on doing this except for a short thread from last year so I guess that I’ll be asking a lot around here. It seems like Coco-2014 is the dataset for this, but if anyone has other suggestions that would be helpful. For now I have two questions:

  • How should I go about doing transfer learning for this? Should I use a regular pretrained cnn/rnn and merge them (I’ll figure out how to about this as I go along) or is there a full network that I can use out of the box for this that has been pretrained?
  • How should I be loading the data? I have images and a json with captions so how should I go about getting that into a databunch? If there is some tutorial out there that goes through this that would also be appreciated.

Any input is much appreciated! Thanks

2 Likes

This is what you want
https://github.com/fg91/Neural-Image-Caption-Generation-Tutorial

5 Likes

@muellerzr Wow, that’s exactly what I’m looking for, thanks so much!

I’m working my way through the code there and I am confused as to the values they use for normalizing the images:
transforms.Normalize([0.5238, 0.5003, 0.4718], [0.3159, 0.3091, 0.3216])
My understanding is that each value in the first list is the mean of that channel in the image that we would like the normalized image to have. The second list the the standard deviation we would like each channel to have in the normalized image. If this isn’t correct, please let me know.
Where do the actual values come from and how would I compute them myself?

Hi muellerzr
Wow the potential for some of these models is frightening as well as exciting!.

mrfabulous1 :smiley::smiley:

This is a very cool project, which looks like it is from the v1 Fastai library, has there been anything that you all are aware of done since then?

Try to figure out the same. Will post somehow if I figure it out.