There is a Kaggle competition fro CareerVillage.org which connects questions from users to suitable professionals that might be interested in answering the particular questions.
The data contains over 10.000 previously posted questions. Most questions have 1 or 2 answers, a few up till 30 answers. Both the answers and questions have attributes like ‘likes’, ‘hashtags’, date and time attributes. Further there are attributes to the users that posted the question and the users (professionals) that answered it like the date of joining, profession, interests, groups joined, previous emails with suggested questions received (for professionals).
The professionals get new open questions to daily and weekly emails, and groups and hashtags they follow. All this information is available.
The goal is to find a sensible way to predict for each answer whether a specific professional would be interested in answering it.
Is there a logical/mathematical way to approach such a problem? I’m not really sure where to start. For each unique question and answer there are only a couple of observations. Many professionals did not answer THAT specific question. Usually only one or two which answered a particular question.
PM: I’m interested in the competition because it seems a good dataset to practice with text. And the concept of matching questions and answers could be easily generalized topics outside of career choice.