Sklearn log_loss error

Gkarmakar · January 13, 2017, 5:14am

Hi,

I am getting this error while using log_loss function from sklearn with one level. I looked it up and saw that there was a known bug on sklearn using for one label but it was supposed to be fixed in later release. I upgraded sklearn using
conda update sk-learn but it didn’t resolve.

The code:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import log_loss

x = [i*.0001 for i in range(1,10000)]
y = [log_loss([1],[[i*.0001,1-(i*.0001)]],eps=1e-15) for i in range(1,10000,1)]

plt.plot(x, y)
plt.axis([-.05, 1.1, -.8, 10])
plt.title(“Log Loss when true label = 1”)
plt.xlabel(“predicted probability”)
plt.ylabel(“log loss”)

plt.show()

Error:
ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument.

carlosdeep · January 23, 2017, 7:19pm

I am facing the same issue. Any hints here?

Gkarmakar · January 25, 2017, 1:40am

I just copied and pasted from sklearn example code and it worked just fine for me.

Christina · February 9, 2017, 9:29pm

I am encountering this same error:

ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument)

when I try to run this example code in dogs_cats_redux Jupyter notebook that plots the log lost curve when true label = 1:

y = [log_loss([1],[[i*.0001,1-(i*.0001)]],eps=1e-15) for i in range(1,10000,1)]

It is throwing up on that first term in the call to log_loss. I am not sure why - it should work.

I read and re-read the SKlearn page on log_loss: log_loss — scikit-learn 1.5.2 documentation

Does anyone have any ideas? This example code should not hold anyone back, it is only plotting the graph at this link: http://wiki.fast.ai/index.php/File:Log_loss_graph.png

taposh · February 19, 2017, 6:28pm

I am getting the same error. I found some git issue where it says its class imbalance problem between truth and prediction. Our truth is set to 1 while our predictions have a range.

renjithmadhavan · February 25, 2017, 6:16pm

I did it like this below and worked around it.

zaoyang · June 23, 2017, 3:10am

Had this error too. This worked for me.

# Visualize log loss for a large number 
from sklearn.metrics import log_loss 
x = [i * 0.0001 for i in range(1,10000)]
#y = [log_loss([1],[[i*.0001, 1-(i*.0001)]],eps=1e-15) for i in range(1,10000,1)]
y = [log_loss([0],[[i*.0001,1-(i*.0001)]],eps=1e-15, labels=[1,0]) for i in range(1,10000,1)]

plt.plot(x,y)
plt.axis([-0.05,1.1, -.8, 10])
plt.title("Log loss when true label = 1")
plt.xlabel("Predicted probability")
plt.ylabel("log loss")

plt.show()

Atlas7 · July 2, 2017, 6:59pm

very nice! Worked!

sebsch · March 18, 2018, 11:04am

Did have the same problem.
In my case shuffling the rows in the dataframe worked just fine. (Data was in order so the first 100 rows or so had the same value)

Use:
dataframe.sample(frac = 1)
frac = 1 will resort all rearrange all rows