One Hot Encoding


I’m confused about the process in the data block API.

  1. does the data block API transform the categorical variables into One-Hot-Encoded vectors? when I print: data.show_batch it looks like the categorical variables are now pandas categories.

  2. Is there another step in the data block API that transforms the categorical variables into One Hot Encoded vectors?

This is my code:


No - nothing is being one-hot encoded. Instead, each categorical column is given it’s own embedding matrix in which it learns robust representations of each item. Check out lesson five or look at the lesson notes here:

Thank you very much!