How to use library for data with multiple kinds of inputs?

I’m taking a kaggle competition with tabular, texts and images as input data. I’ve tried to use keras but got bad results so I shifted to part of the data) and it got much better.

But the problem is that don’t have a high-level wrapper for data with multiple kinds of inputs. It’s possible to pre-process the image data using some existing model and then save the features and treat them as numerical data. However I would like to first use ~1 dense layer for each kind of data before concatenating them (as the same kind of data are more relevant to each other and different kinds of data are suitable for different dropout rate etc.), so what should I do?


If I just need to learn to use pytorch to do this it should be much easier, as there are lots of tutorial and codes available online. However I’ve googled tutorials and codes, but could only find a few less useful ones, and didn’t find code that deals with multiple kinds of inputs.

So if anyone knows about such code using to train a nn that deals with multiple kinds of inputs, could you please throw me a link? I think I would be able to write proper code if there’s such a reference.

1 Like

I tried to modify the source code of TabularLearner, but the problem is that it relys on DataBunch, which deals with “categorical” and “numerical” data but not image or text, so I got stuck.

you will also need a model that can accommodate all this data together. You may start from here:
and here:
If you like, we could work together, I have the same problem.
I don’t know how to build such a model…
It should take a pair (image, data) and feed this to output a numerical value.
Feed the image trough a convnet (resent18 for instance) and the data through a TabularModel, and then join everything after the flatten layers. Stack some denselayers and output your desired output.

1 Like

Thanks for your links, Thomas. I’m certainly willing to collaborate but since I’m new to I’m not sure how much I can help. I’ll get familar with these functionalities first and I would be glad if you could give any other advice on what we should do.

1 Like

If you check out the thread How to use images and tabular data in one model? I think you’ll find some useful material.

1 Like

Thanks, those resources are great. I’ve modified some code from that nlp & tabular model, the concated model was built but no matter what I try(including trying to over-fit) it always output the same value(~median) for every input, what should I do?
I had looked at the concated dataset, the features and y corresponds correctly.

What do you get if you try the models by themselves? For example, if you only do the NLP model, what do your predictions look like?

I’d start there, because that may indicate if there is a problem with the underlying models or with the concatenation part of the code.

All of the models(all of 3 are tabular(cat, cont, extracted image) currently) could learn and work just fine. I’ve also tried to make the dropout for categorical and continous to be 1(thus making the concatenated model’s architect exactly the same as the image model) but it still always have the same output.

I think the radek NB is the way to go!

He used create_func to get the dataset, but I would like to use a dataframe to create the dataset, and create_func seems isn’t used anymore? Could you please share some code(about how you create the dataset)?

In the quickdraw kernel, create_func was used for seperating different kinds of inputs, but now it’s removed from got an error reporting “unexpected argument”), so what’s the alternative?

You can now use MixedItemList (graciously coded by @sgugger) to merge different types of ItemList together.

Look at this thread where I try to tackle this exact problem: Custom ItemList, getting ForkingPickler broken pipe

I successfully trained a model with Image + Tabular. Still trying to integrate text now.

You can also see my model at the bottom of this post and how I loaded the data.

I am currently working on integrating text data. I successfully implemented the data collate function for the variable length text input in the batches. The image, tabular AND text data is correctly passed to the forward function of my Module. But I am running into problems for loading a pre-trained text-encoder I can use in my custom Module… Here is the thread about that if you are interested: Pre-trained text encoder.


I have heard about Ludwig being used for such problems before. Feel free to check it out.

Thanks. I found removing loss_func when creating learner fixed the problem, the model can be trained now. I think that nlp kernel is a good example to follow.

I’m interested in doing the same but with text and images. I’m too trying to adapt the nlp + tabular notebook but am stuck with the custom collate and forward functions. Is it possible for you to advice on how you adapted them?

Actually I’m busy doing other things these days and I had only concated image data as output tabular features from pretrained cnn, which I didn’t use collate functions. I’m still trying to use language model’s pre-trained weight for kaggle. Of course I’ll tell you if I have fixed all.

You guys (@LIBER, @gpng) might be interested by this notebook I wrote where I merge Image + Tabular + Text data all in one neural network: