I’m using the text module to categorize medium-sized pieces of text. Each item to be categorized has several free-form text fields, in addition to several categorical fields. What is the best approach to using all this information for categorization? I see two options:
- Concatenate the text from all fields, preceding each field content with a special token. Run concatenated text through LSTM.
- Train one model per field. Concatenate output from each model in a hidden layer and pass into subsequent layers.
What are the benefits of each of the approaches? Is there an alternative I’m missing?
For option 1, I see a previous suggestion to create a custom
ItemList implementation, without success, here. I see another successful attempt at creating a combined structured + text learner. Are these the right direction?