NLP on multi-field text objects

I’m using the text module to categorize medium-sized pieces of text. Each item to be categorized has several free-form text fields, in addition to several categorical fields. What is the best approach to using all this information for categorization? I see two options:

  1. Concatenate the text from all fields, preceding each field content with a special token. Run concatenated text through LSTM.
  2. Train one model per field. Concatenate output from each model in a hidden layer and pass into subsequent layers.

What are the benefits of each of the approaches? Is there an alternative I’m missing?

For option 1, I see a previous suggestion to create a custom ItemList implementation, without success, here. I see another successful attempt at creating a combined structured + text learner. Are these the right direction?

1 Like