Image captioning system using fastai

Hello everyone,
I am going through the part 1 of the course and want to work on a interesting idea of generating memes.
My idea is to use a Image captioning system which generate humorous captions for a given image.
Can someone point me in the right direction how I can start with this project.
I have a script which can scrape the data from a website ‘memegenerator.net’ but I am open to more sources for the data.
Also what kind of deep learning model would go into developing something like this. I have read about attention based models so I would be really glad if someone can suggest ways to implement those models using fastai or another models which can be used for this problem statement.