How to best integrate both image and tabular data in a single model using fastai?


(Yuan Tian) #1

Suppose that for each example, we have both image and tabular data. For the image, we can use a CNN-based model, and for the tabular data, we can use embeddings and full-connected layers. With fastai, it is easy to build two separate models for each type of data.

But what if we want to build a single model? My thought is that, for each sample, we load its image and tabular data and feed them to a CNN and fully connected neural network, respectively. We then concatenate the outputs and feed them into another fully connected neural network to generate the final outcomes.

I was wondering how to best implement this using fastai. I imagine that we first need to build a custom Data type or DataBunch that can load both image and tabular data and then a model that integrates three neural networks. I would love to hear your thoughts. Thanks!


(Matthew Teschke) #2

I think this is a great question - there are many circumstances where I think we’d want to merge structured metadata with an image or text analysis. For example, in the recent Gaggle Quick Draw competition, it could have been useful to take country into account along with the image drawn, since people from different regions around the world may have different interpretations of a word (e.g., “football”)

I think the approach you suggest makes sense - I don’t know how to do it, but I bet it can be done.