Tutorial/example for mixed image+tabular+text?

rmkn85 · October 1, 2019, 8:33pm

Hello,

Let’s say I’m starting from images so using Vision, then extract their EXIF so now I have Images and many fields with various values thus making it Tabular(?), and on some of these fields I want to use NLP thus making it also Textual.

In some cases, the input data itself will have binary files as well as a spreadsheet with extra information.

Some people already referred to similar issues in fastai v1:

So, what is the way in fastai v2 to load the data from different kinds and combine it?
DataFrame rows? Tuples? Designated objects? Multiple Datasets?
Can you please provide an example/tutorial? (possibly referring to the challenges mentioned in the above forum posts).

Thanks!!
RK.

jeremy · October 1, 2019, 9:46pm

This has been asked before - it’s not something we’ve built yet, but we plan to work on soon-ish.

Thanks for the helpful links!

fabris · October 2, 2019, 1:19pm

Hi, loading different kinds of data from a variety of sources is a tedious work but it is not too complicated. Doing that efficiently, it is another story.

By the way, eventually, you will end up with tensors and get to the crucial part: how and when do you have to combine them? It is not a easy task. Pay attention to the size of your input, and how the different components of your model are connected one another. The simplest naive approach is to “fuse”(e.g. concat) everything at the end just before your model’s “head”. Sometimes works, give it a try.

hope it helps

wgpubs · October 18, 2019, 11:17pm

You can reference an article I wrote on creating a MixedItemList for text + tabular here:

I’m working on a follow-up article that demonstrates how one may hack the existing fastai learners/models so as to consume such a dataset, with hopes of doing this all over again in December with the v2 bits. Btw, any advice on how to do this correctly are much appreciated

wgpubs · October 31, 2019, 8:21pm

Part 2 of my series on using the DataBlock API is here:

Demonstrates a method to train a model using the custom DataBlock API bits built in Part 1.

Shanmugam · February 28, 2020, 7:28am

Hi,
Thanks for sharing this. I couldn’t add test df into this. Can you please help me

wgpubs · February 28, 2020, 6:13pm

Been porting everything to v2 but I can try. What is the problem?

Andreas_Daiminger · April 20, 2020, 9:55am

Hey @wgpubs!

Thanks for the great work. Do you have a working version of a a mixed tabular+text model for fastai2?
I would be happy to collaborate on this.
I did something similar based on @quan.tran 's solution with fastai v1.
https://towardsdatascience.com/next-best-action-prediction-with-text-and-metadata-building-an-agent-assistant-81117730be6b

Right now I am diving into fastai2 . So I thought would be cool to update this.

wgpubs · April 20, 2020, 5:18pm

Yah I’d like too.

I’ve done some work a few months ago but got consumed with other things in between then and now :). I don’t have much time right now to work on this personally but if you’d like to collaborate on this, I’m down! I’ll let you know when I get a bit more free time but feel free to keep me updated on your work in the meantime.

Thanks -wg