So I read through the post.
In addition to encouraging the use of a multi-label classification framework, he has some really interesting ideas about feature engineering that I’ve noted below.
If I understand things correctly, the better way to organize my targets is something like this:
[very pos | pos | neutral | neg | very neg], threat?, suggestion?
Example for a very dissatisfied user with a suggestion: very neg, suggestion
My thoughts on Ruder’s post in relation to the question stated here:
On the way to this goal, we first need to learn more about the relationships between our tasks, what we can learn from each, and how to combine them most effectively.
Implies that a multi-label approach will not only help with classification, but also help us infer relationships between the labels (e.g., if threatening comments are also comments that contain suggestions, perhaps it indicates we should investigate to see if the threats may derive from users who think their suggestions are being discounted)
A sentiment model might benefit from knowing about the general audience response to a movie or whether a user is more likely to be sarcastic while a parser might be able to leverage prior knowledge of the domain’s tree depth or complexity.
If we have quantitative data from the user in addition to their comment (e.g., overall satisfaction on a 1-5 scale, their age, their ethnicity, etc…), including such data as features in our model may be beneficial. This is an example of feature engineering.
For sentiment analysis, for instance, Yu and Jiang (2016) [20] predict whether the sentence contains a positive or negative domain-independent sentiment word, which sensitizes the model towards the sentiment of the words in the sentence.
Another example feature engineering where we the presence of certain words or phrases can be added as a feature to our dataset.