Calculating log_loss Keras vs. sklearn

mimi · May 20, 2017, 8:50am

Hi,

I’m having trouble to calculate the log_loss that keras calculates.
I understand log_loss and cross_entropy and know how it is calculated, nonetheless something is going terribly wrong.

According to keras my cross_entropy on validation set is something like 0.0xxx. But when I use sklearn.metrics.log_loss with one_hot encoded labels and the predictions of my network, I get a value of 3.87xxx. So for y_true I use an encoded array with dim (n_samples, n_classes) and y_pred is an array with the corresponding softmax predictions.

So:

val_predictions[:5]
>>>array([[  9.77910817e-01,   2.20891740e-02],
          [  9.98937905e-01,   1.06205838e-03],
          [  9.99959946e-01,   4.00941935e-05],
          [  9.99999404e-01,   5.46007016e-07],
          [  9.99951005e-01,   4.89662743e-05]], dtype=float32)

y_val[:5]
>>>array([[ 1.,  0.],
          [ 1.,  0.],
          [ 1.,  0.],
          [ 1.,  0.],
          [ 1.,  0.]])

I was able to replicate sklearns log loss with numpy which made me even more wonder .
With numpy I used:

- 1/val_predictions.shape[0] * np.sum( np.sum(y_val*np.log(val_predictions), axis=1), axis=0)

and sklearn:

log_loss(y_val, val_predictions)

I appreciate your help.