I would like to create an imaging caption model using fastai. I haven’t found too much in terms of resources on doing this except for a short thread from last year so I guess that I’ll be asking a lot around here. It seems like Coco-2014 is the dataset for this, but if anyone has other suggestions that would be helpful. For now I have two questions:
- How should I go about doing transfer learning for this? Should I use a regular pretrained cnn/rnn and merge them (I’ll figure out how to about this as I go along) or is there a full network that I can use out of the box for this that has been pretrained?
- How should I be loading the data? I have images and a json with captions so how should I go about getting that into a databunch? If there is some tutorial out there that goes through this that would also be appreciated.
Any input is much appreciated! Thanks