I have some images with a csv file containing some additional information about them / features and I want to do a binary classification.
My idea is to use a CNN with the image (already working) and use an other CNN with Tabular for the csv file and merge them at some layer to make a unique output.
My question is can I make a learner with two inputs?
Or should I just make my one input as a concatenation of image+tabular info and slice it as my first layer and run two branches until the merge?
Is a even possible or should I stay with two separate models and do some kind of weighted voting on the prediction ?
I have been wondering about that for a while (in the context of NLP and tabular data). I found two resources recently that show how to do that in fastai.
In the Kaggle Quickdraw competition Radek made some great example notebooks. One of them shows how to combine 4 model architectures together (look at the MixedInputModel).
thanks very much for the info! very useful! I am a bit curious how the author of the notebook decides to include the decoder part of the nlp & tabular model before the concatenating? i thought you only concatenate the last embedding layers of NLP and Tabular and then attach the decoder together? Does anyone have an insight on this?