How can I improve my Text Classification?


  • to classify questions into different topics
  • each question has at most five topics
  • 3M questions
  • 2000 topics

Dataset info:

  • question titile
  • question description
  • topic title
  • topic description
  • topic relationship(DAG), for example, topic programming is the parent of topic Python programming and C programming

What I did:

By using two models(TextCNN, Hierarchical Attention Network) and question titles & question descriptions as input, I can get decent result. The top5 predictions can cover about 55% topics.

But I have no idea about how to use topic information(title, description and DAG) to improve my score.

How can I use topics information in text classification?

Thank you.