Different results when predicting image by image vs model.evaluate/model.predict (keras)

Hello everyone,

I’m experimenting with the deepweeds dataset ( a multi-class dataset of 9 classes of weeds, also available as part of tensorflow-datasets), i’m using keras.

when i run
model.evaluate i get a different accuracy than if i run model.predict on each image (in a for loop)

also, the creator of the dataset published their model (.hdf5 file) (they did transfer learning on resnet50).
When i run model.evaluate on his model i get a very low accuracy and a very high loss (same with model.predict on the test_generator), but when i calculate the accuracy by running model.predict on each image i get a very good result:

Screenshot 2023-05-29 at 23-23-09 notebookc5e39ff05b Kaggle

here is his code:

here is my code (the last cell is the prediction done image by image):

i have several questions:

1/why are the results of model.evaluate/model.predict so different than running model.predict image by image? (i tried training my models, and i get a different reading too but the exact opposite, high for model.evaluate, and lower for model.predict run on each image)

2/why does his code work? he used sigmoid for activation instead of softmax, and he used binary_crossentropy as the loss function instead of categorical_corssentropy, and used accuracy as a metric instead of categorical_accuracy

3/why did he augment the validation dataset? shouldn’t only the training dataset be augment in case of a small dataset to introduce variations

Thank you very much

Edit: the dataset is divided into test/validation/training by the creator, there are csv files for each in the labels directory in the git repo

i figured out question 1, it because i didn’t re-scale the images, i did pass the prepossessing function to the image generator but apparently that doesn’t work