Targets:
- to classify questions into different topics
- each question has at most five topics
- 3M questions
- 2000 topics
Dataset info:
- question titile
- question description
- topic title
- topic description
- topic relationship(DAG), for example, topic
programming
is the parent of topicPython programming
andC programming
What I did:
By using two models(TextCNN, Hierarchical Attention Network) and question titles & question descriptions as input, I can get decent result. The top5 predictions can cover about 55% topics.
But I have no idea about how to use topic information(title, description and DAG) to improve my score.
How can I use topics information in text classification?
Thank you.