How to add categorical data to NLP generation (related to lesson 4 video)?

Hi,

Background:
I’m currently working on a problem of trying to generate an action plan based on a set of categorical data in the Mental Health domain. The particular database I’m working on has approximately 150 categorical fields (demographics, and categorical rankings [1-10] etc.) and a single free-text “action plan” field. I’m working on a small database of about 15,000 such plans. The problem I’m trying to solve is whether it would be possible to use a deep learning algorithm to automatically generate a sample plan once a user has filled in all the categorical fields.

Questions:
When watching the lesson 4 video (part 1, v2 around minute 1:28) Jeremy demonstrates how to use a fast.ai model trained on scientific papers to generate an scientific abstract given the first line of the abstract. This problem looks a lot like the problem I’m trying to solve. In the example he has a few metadata tags that he provides in the first line - for example " cscv algorithms. on"
I have 3 questions:

  1. Is this a standard way to incorporate categorical information into an NLP model?
  2. Would it be feasible to do this with 150 categorical variables or is there a better way to incorporate these variables?
  3. Since our database is fairly small I was thinking about first training it with some domain specific journals etc. and then feeding in our data - is this a reasonable approach or is just too little data to do anything reasonable with?

Thank you for any advice anyone has to offer!

Thank you,
Patrick

2 Likes

Hi,
Did you figure out the best approach to incorporate categorical data?