I have a project where I have a single text vocab but multiple sentences per sample. I.e. I get 2 parts of a conversation (question, reply, also names). I also have some numerical data (age, for instance). So I want to build a Classifier that uses a language model encoder to encode the 2 sentences and adds the numerical information.
This is simple in plain pytorch, just get the embeddings from the language model encoder (one text part at a time), numericals and concatenate them and pass to a dense head for classification.
I’m however trying to do this the ‘fastai’ way and want to create a databunch, none of the text databunches support multi text columns (they just concate name then into a single list which is bad for me). And the data_blocks api does not appear to allow me to create a mixed mode databunch.
Does anyone have any example of site I can look at to see how to implement a mixed text (multi text cols) + numerical model? It would be something like the Tabular databunch/models that handle categoricals and numericals.