Accuracy reported by Model.fit seems wildly off

I must be doing something wrong, but I really can’t figure out what it is. I’m training a dead-simple model on 50K items, and the fit() method reports “acc:” of ~75%. But when I invoke the model’s call() method on individual items, its output matches the label over 95% of the time. Shouldn’t these percentages be identical? I doubt there’s such a glaring bug in Keras, so either I don’t understand what fit() is reporting, or I’m unclear on how call() is supposed to work. :confused: Does anyone else have a clue on what’s really going on?

Here is my code, followed by an output fragment:

from random import Random
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import theano as T
import numpy as np
import keras.backend as K

# Generate training data and labels.
rnd = Random()
datrn = []
labtrn = []
for _ in xrange(50000):
    a = rnd.randint(-99999, 99999)
    b = rnd.randint(-99999, 99999)
    c = rnd.randint(-99999, 99999)
    d = rnd.randint(-99999, 99999)
    datrn.append([a, b, c, d])
    labtrn.append([a<=c, b<=d])

# Train a simple dense layer to predict the result.  Any solution
# where weights [0,2] and [1,3] are opposite numbers will achieve
# perfect accuracy.
mo = Sequential([Dense(2, input_dim=4, activation='sigmoid')])
mo.compile(Adam(lr=0.01), metrics=['accuracy'], loss='mean_squared_error')
mo.fit(datrn, labtrn, nb_epoch=40, batch_size=1000)

# Check accuracy manually via Model.call() on the input data.
intensor = T.tensor.vector('in')
outensor = mo.call(intensor)
modfun = T.function([intensor], outensor)
def labelmatch(i):
    '''True if modfun accurately predicts the label of datrn[i]'''
    return (modfun(datrn[i]) == labtrn[i]).all()
predicts_correctly = np.array([labelmatch(i) for i in xrange(len(datrn))])
print float(np.count_nonzero(predicts_correctly))/len(predicts_correctly)

And here’s the last few lines of output:

Epoch 40/40
50000/50000 [==============================] - 0s - loss: 0.0233 - acc: 0.7495     
0.95052

Well, it could be a Keras bug, I suppose: