Is it possible to represent text classification labels in a better way?

If one is using ULMFIT for text classification, for example to classify news documents into its category (sports, politics, humor), these target labels are one-hot encoded. It could be beneficial to represent these labels with their word embedding because:

  1. The word of the target label can occur in the document
  2. There can be word similarities between the target label word and words in the document.

By one-hot encoding these labels this information is lost. But I don’t see a way in which such labeling can be incorporated into ULMFIT or transformers like Bert.