If one is using ULMFIT for text classification, for example to classify news documents into its category (sports, politics, humor), these target labels are one-hot encoded. It could be beneficial to represent these labels with their word embedding because:
- The word of the target label can occur in the document
- There can be word similarities between the target label word and words in the document.
By one-hot encoding these labels this information is lost. But I don’t see a way in which such labeling can be incorporated into ULMFIT or transformers like Bert.
Thoughts?