Different results when predicting image by image vs model.evaluate/model.predict (keras)

user062 · May 30, 2023, 1:23am

Hello everyone,

I’m experimenting with the deepweeds dataset ( a multi-class dataset of 9 classes of weeds, also available as part of tensorflow-datasets), i’m using keras.

when i run
model.evaluate i get a different accuracy than if i run model.predict on each image (in a for loop)

also, the creator of the dataset published their model (.hdf5 file) (they did transfer learning on resnet50).
When i run model.evaluate on his model i get a very low accuracy and a very high loss (same with model.predict on the test_generator), but when i calculate the accuracy by running model.predict on each image i get a very good result:

here is his code:

github.com

AlexOlsen/DeepWeeds/blob/master/deepweeds.py

import argparse
import os
from zipfile import ZipFile
from urllib.request import urlopen
import shutil
import pandas as pd
from time import time
from datetime import datetime
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, TensorBoard, CSVLogger
from keras.optimizers import Adam
import csv
from keras.models import Model, load_model
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
from keras import backend as K
from skimage.io import imread
from skimage.transform import resize
from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50

This file has been truncated. show original

here is my code (the last cell is the prediction done image by image):

i have several questions:

1/why are the results of model.evaluate/model.predict so different than running model.predict image by image? (i tried training my models, and i get a different reading too but the exact opposite, high for model.evaluate, and lower for model.predict run on each image)

2/why does his code work? he used sigmoid for activation instead of softmax, and he used binary_crossentropy as the loss function instead of categorical_corssentropy, and used accuracy as a metric instead of categorical_accuracy

3/why did he augment the validation dataset? shouldn’t only the training dataset be augment in case of a small dataset to introduce variations

Thank you very much

Edit: the dataset is divided into test/validation/training by the creator, there are csv files for each in the labels directory in the git repo

user062 · May 30, 2023, 6:55pm

i figured out question 1, it because i didn’t re-scale the images, i did pass the prepossessing function to the image generator but apparently that doesn’t work