Tabular data and word embeddings

abhibha · May 24, 2020, 12:16pm

Hello everyone
I have data available to me in two forms. One is in a tabular form consisting of both categorical and continuous values and the other is in textual form. I want to create a hybrid feature space consisting of embeddings from the text and the features from the tabular data so that I can pass it to the Tabular learner. We have the TabularList class to deal with the tabulated data while the Databunch class to create embeddings of text data. So I was wondering, how to use both the things together?
Any help will be appreciated!

jeremyeast · May 25, 2020, 5:04am

You cannot do this easily. I suggest your searched for “Mixed models” on this forum. A few people have shared some ways to do this, it involves taking the heads of both models and writing a custom tail; all through Fastai .

Please do share here your insights.

abhibha · May 25, 2020, 11:12am

Hey, Thanks for the reply! I’ll make sure to share my insights once I have figured it out.

Swanson · June 1, 2020, 12:48am

Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data. Tabular data is the most commonly used form of data in industry. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data.

navneetkrch · June 13, 2020, 1:03am

waiting for the update on your approach.