Hi all,
I’ve been trying to play around with BERT & Friends in Huggingface’s transformers package. I stumbled upon their Pipelines, and want to play around with sentiment analysis.
When I try to expand their example, I get some results I was not expecting.
When copying their example verbatim, I get the expected result
In [1]: from transformers import pipeline
...:
...: nlp = pipeline("sentiment-analysis")
...:
...: print(nlp("I hate you"))
...: print(nlp("I love you"))
[{'label': 'NEGATIVE', 'score': 0.9991129}]
[{'label': 'POSITIVE', 'score': 0.99986565}]
If I combine the sentences in a list, I also get the same results
In [2]: print(nlp(["I hate you", "I love you"]))
[{'label': 'NEGATIVE', 'score': 0.9991129}, {'label': 'POSITIVE', 'score': 0.99986565}]
However, I cannot create a list of three sentences
In [3]: print(nlp(["I hate you", "I love you", "I am ambivalent to you"]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-235662c92ee9> in <module>
----> 1 print(nlp(["I hate you", "I love you", "I am ambivalent to you"]))
/anaconda3/envs/bertExp2/lib/python3.6/site-packages/transformers/pipelines.py in __call__(self, *args, **kwargs)
503 def __call__(self, *args, **kwargs):
504 outputs = super().__call__(*args, **kwargs)
--> 505 scores = np.exp(outputs) / np.exp(outputs).sum(-1)
506 return [{"label": self.model.config.id2label[item.argmax()], "score": item.max()} for item in scores]
507
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Moreover, other times, combining two sentences into one nlp()
call does lead to different results!
In [4]: sent1 = "Four score and seven years ago our fathers brought forth on thi
...: s continent, a new nation, conceived in Liberty, and dedicated to the pr
...: oposition that all men are created equal."
...: sent2 = "We hold these truths to be self-evident, that all men are creat
...: ed equal, that they are endowed by their Creator with certain unalienabl
...: e Rights, that among these are Life, Liberty and the pursuit of Happines
...: s"
...: print(nlp(sent1))
...: print(nlp(sent2))
...: print(nlp([sent1, sent2]))
...: print(nlp([sent2, sent1]))
[{'label': 'POSITIVE', 'score': 0.99777234}]
[{'label': 'POSITIVE', 'score': 0.9786857}]
[{'label': 'POSITIVE', 'score': 1.627845}, {'label': 'POSITIVE', 'score': 0.9786857}]
[{'label': 'POSITIVE', 'score': 0.59639287}, {'label': 'POSITIVE', 'score': 0.99197847}]
(note how the score for first sentence in the pair is now very different than before).
Sometimes, the score could even flip!
In [5]: sent3 = "We the People of the United States, in Order to form a more per
...: fect Union, establish Justice, insure domestic Tranquility, provide for
...: the common defence, promote the general Welfare, and secure the Blessing
...: s of Liberty to ourselves and our Posterity, do ordain and establish thi
...: s Constitution for the United States of America."
...: print(nlp([sent2, sent3]))
[{'label': 'NEGATIVE', 'score': 0.12308767}, {'label': 'POSITIVE', 'score': 0.9993409}]
Now, sentence 2 is considered negative!
Has anyone had experience with these pipelines, who might be able to explain how to work with this function?
Thanks!