Combining text and image into one model


I’m making classification for a lot of classes (about 500), and in my dataset I’ve got text (like opinion) and image of item. I want to combine this two features to get better results. Is there possibility to do this using fastai library? Or maybe I should make two separate models and later combine outputs of them in some way?

Best regards