Question on labeling text for sentiment analysis

How should we label text in cases where we don’t merely want to know if it is positive or negative, but how positive or negative it is?

In my domain, for example, folks want to distinguish between positive and very positive reviews. I was thinking it might be better to label the reviews as a number between 1 (really negative) to 5 (really positive) instead of just using using 1 or 0.

Also, given that we are using a pretrained model here, can we do multi-label classification where a review might be classified as ‘positive’, ‘thrilling’ and ‘great special effects’? Basically like the planet competition but for text.

While this does not necessarily answer your question regarding multi-labels, with binary classification I believe you should still be able to figure out how “strong” the sentiment is by examining the probability of the prediction even if it’s just binary (positive, negative). You can see in lesson 1 notebook, you can get the prediction probability at the time of inference and use that as a guide for how strong the prediction is. So, for example, a positive prediction (1) with a probability 0.9 may indicate that this is a very positive text.

1 Like

Hi - you can use a binary classification scheme and get the information you want from the score of the classifier. Or, if you have enough data, you can try n labels (n > 2) - the issue may be that sometimes, with multiple levels for sentiment analysis-type tasks it becomes difficult to get annotators to agree on the label. One thing people have tried is using the stars on review sites as an implicit label but there’s been work showing the sentiment of reviews doesn’t correlate that amazingly well with how many stars people give… so it makes sense to get labels manually for your reviews as you suggest.

I think giving n- labels which may or may not be mutually exclusive like ‘thrilling’ , ‘good’, ‘special effects’ is still an open problem in NLP, considering what @jeremy said about NLP being a couple of years behind CV .

1 Like

My understanding is that @jeremy was setting it up in below analogies:

This language model from imdb :: imagenet trained architecture
Positive/negative :: dogs/cats.
So, below analogy should work as well
0to5 rating :: dog breed classification

Exactly right! Same techniques should work fine.

And multi-label classification: Planet

Really great analogies @ravivijay and @jeremy!

I’m still trying to wrap my head around the fact that we can train a model to predict what the next word will be in a sentence, and then use that model to classify a block of text as positive or negative. It’s a pretty amazing idea.

So thinking about this a little further, it sounds like I should train a language model that is then applied to multiple classification problems separately. Why? Because of how I’m trying to classify different things:

Positivity: I want to classify the text as very pos, pos, neutral, neg, very neg (e.g., like dog breed comp; using a 1-5 rating)

Threatening: I want to classify the text as containing any threats or not (e.g., like dog/cat; 1 or 0)

Suggestion: I want to classify the text as containing a suggestion or not (e.g., like dog/cat; 1 or 0)

It seems like I could turn this into a multi-label classification problem, but that I would lose some of the nuances in looking at each thing separately, no?

I’m hoping to get this down before I ask my co-workers and interns to start labeling data for us to work with so any feedback and what you’ve seen and what has worked and not worked is highly appreciated :slight_smile:

It would be better to do them all in one model. This is called multi task learning, and turns out to work better than separate models.

1 Like

So I read through the post.

In addition to encouraging the use of a multi-label classification framework, he has some really interesting ideas about feature engineering that I’ve noted below.

If I understand things correctly, the better way to organize my targets is something like this:

[very pos | pos | neutral | neg | very neg], threat?, suggestion?

Example for a very dissatisfied user with a suggestion: very neg, suggestion

My thoughts on Ruder’s post in relation to the question stated here:

On the way to this goal, we first need to learn more about the relationships between our tasks, what we can learn from each, and how to combine them most effectively.

Implies that a multi-label approach will not only help with classification, but also help us infer relationships between the labels (e.g., if threatening comments are also comments that contain suggestions, perhaps it indicates we should investigate to see if the threats may derive from users who think their suggestions are being discounted)

A sentiment model might benefit from knowing about the general audience response to a movie or whether a user is more likely to be sarcastic while a parser might be able to leverage prior knowledge of the domain’s tree depth or complexity.

If we have quantitative data from the user in addition to their comment (e.g., overall satisfaction on a 1-5 scale, their age, their ethnicity, etc…), including such data as features in our model may be beneficial. This is an example of feature engineering.

For sentiment analysis, for instance, Yu and Jiang (2016) [20] predict whether the sentence contains a positive or negative domain-independent sentiment word, which sensitizes the model towards the sentiment of the words in the sentence.

Another example feature engineering where we the presence of certain words or phrases can be added as a feature to our dataset.

Like Jeremy says, you can also try them in one model. The question of what tasks benefit from multi task learning in NLP and how much is actually still pretty open :).

  • e.g. here’s a paper from this year on a related task for tweets . ( I am working on a related task for product reviews - extracting usage/suggestions/etc. together with sentence polarity info )

Right, that is what I gathered as well. Essentially, turn this into a multi-label classification problem and train a model accordingly. If by “one model” you mean something else, let me know what you’re thinking.

How are you setting up your training data for the product reviews problem you’re working on?

Multi-label classification and Multi-task classification seem to be 2 different things. Multi-label classification will have single last layer giving out scores for all labels. Multi-task classification will have multiple parallel layers as last layer, output of each solving “thriller/not”, “positive/not”, “suggestion/not”.

I think this gets confusing because people also distinguish between “one-class” and “multi-class” classification, as well as “multi-class” and “multi-label” classification :). Usually when people say multi-label in research in my field it’s multiple labels per instance…

I think I got you.

You’re saying the “mult-task” means that we are trying to predict multiple and related things simultaneously. Kind of reminds me of lesson 7 from the original part 1 of this course where we build a model that predicts both the kind of fish in a picture AND the bounding box coordinates for where it is located.

@jeremy: Is this ability in the fastai framework? Or is this something that would require us doing it more natively in pytorch?

1 Like

I can think of below examples. Pls. correct me if I am wrong.
one-class :: dog/cat
multi-class:: dog breed
multi-label:: list all animals in the pic
multi-task:: solve multiple tasks, each of which can be one of one-class/multi-class/multi-label.


in multi-class classification, an instance gets precisely one label (and there are multiple classes). in multi-label classification, an instance can have multiple labels (e.g. a picture can have a dog, a cat, etc.) - i think we’re saying the same thing, yes. And the multi-task setting is about jointly solving multiple classification tasks whatever they are.

1 Like

That makes sense.

Now the question is, how to implement? Is this something handled by the fastai framework or something that requires some pytorch work?

I was going to do multiple independent classification tasks but I am interested in the multi-task setting now because there is a relation between sentence polarity and whether it’s a “usage” sentence, or between sentences mentioning specific product features and the presence of advice in a sentence, etc.

I think we need @jeremy to weigh in on this :slight_smile:

1 Like

me too! haha