How to add categorical data to NLP generation (related to lesson 4 video)?


I’m currently working on a problem of trying to generate an action plan based on a set of categorical data in the Mental Health domain. The particular database I’m working on has approximately 150 categorical fields (demographics, and categorical rankings [1-10] etc.) and a single free-text “action plan” field. I’m working on a small database of about 15,000 such plans. The problem I’m trying to solve is whether it would be possible to use a deep learning algorithm to automatically generate a sample plan once a user has filled in all the categorical fields.

When watching the lesson 4 video (part 1, v2 around minute 1:28) Jeremy demonstrates how to use a model trained on scientific papers to generate an scientific abstract given the first line of the abstract. This problem looks a lot like the problem I’m trying to solve. In the example he has a few metadata tags that he provides in the first line - for example " cscv algorithms. on"
I have 3 questions:

  1. Is this a standard way to incorporate categorical information into an NLP model?
  2. Would it be feasible to do this with 150 categorical variables or is there a better way to incorporate these variables?
  3. Since our database is fairly small I was thinking about first training it with some domain specific journals etc. and then feeding in our data - is this a reasonable approach or is just too little data to do anything reasonable with?

Thank you for any advice anyone has to offer!

Thank you,


Did you figure out the best approach to incorporate categorical data?