How do I understand the steps parameters in predict_generator?

  I'm not so sure about the role of the steps parameter in predict_generator, and what I understand is that steps represents the amount of data generated by the generator, but someone denies my answer, and someone confirms my answer I through the practice still couldn't find the right answer, my way is this, I use openslide to read a 5000x5000 size image, each produced a small map 100x100 to forecast the normal I can read 2500 100x100 the size of the picture, but when I set the steps=2500 is wrong

this is code:
# coding=utf-8
from future import division
from keras.models import load_model
import openslide
import numpy as np
import Get_file_name
import generator
import matplotlib.pyplot as plt

def predict_model(img):
    model = load_model(Get_file_name.model_path[0])
    y= model.predict_generator(generator.pre_gen(img),steps=30)


this is wrong:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/", line 914, in _bootstrap_inner
File "/usr/lib/python3.5/", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/", line 612, in data_generator_task
generator_output = next(self._generator)


Traceback (most recent call last):
File "/home/zh/视频/MitosisDetection/mitosisDetection/", line 17, in <module>
File "/home/zh/视频/MitosisDetection/mitosisDetection/", line 12, in predict_model
y= model.predict_generator(generator.pre_gen(img),steps=2500)
File "/usr/local/lib/python3.5/dist-packages/keras/legacy/", line 88, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/", line 1183, in predict_generator
File "/usr/local/lib/python3.5/dist-packages/keras/legacy/", line 88, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/", line 2108, in predict_generator
outs = self.predict_on_batch(x)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/", line 1696, in predict_on_batch
outputs = self.predict_function(ins)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/", line 2229, in __call__
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/", line 778, in run
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/", line 961, in _run
% (np_val.shape,, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape () for Tensor 'conv2d_1_input:0', which has shape '(?, 64, 64, 3)'

Process finished with exit code 1

If steps represents the amount of data generated by my generator, then why do I set steps=2500 wrong? If steps does not represent the amount of data generated by the generator, then what should I do? What should I do to set the amount of data generated by my generator? Please God more advice, I’m around today, many people do not understand this thing!

See here:

“steps: Total number of steps (batches of samples) to yield from generator before stopping.”

What this general means for Keras 2 is that steps should be equal to the number of training examples divided by your batch size (trn.n/batch_size). What you are defining is how many batches you are going to predict.

If you have steps=30 and your batch size = 4, you will predict the results for only 120 examples.

Thank you for your answer, but when I’m using predict generator, I still don’t know how to use this generator. How do I need to use next in the predict function or generator? There is no example code, you can refer to

You don’t use next in the generator … you simply pass in the batches you want to process.

I’m not sure how far you are into Part 1, but you may want to do it using Python 2 + Keras 1.x instead of using Keras 2. The API is a bit different and using the existing notebooks with python 2/Keras 1 will allow you to step through the code and see what is being passed to Keras and what is coming out. This is how I’ve gone through Part 1 and it was extremely helpful.

So perhaps it will help to explain what exactly this generator is doing.

With no arguments passed to image.ImageDataGenerator, the answer is simple: pretty much nothing. Now, the flow and flow_to_directory methods will allow you to organize into batches and change the size of the image, etc, but it’s not until you get into data augmentation that this becomes powerful.

If you haven’t gotten to lesson 3 yet, then just wait and it will all be explained. If you read the keras documentation, it says that the batches are cycled through indefinitely, so in predict_generator and other _generator methods, you have to tell the method how many steps to take or else it will continually iterate.

As the documentation say, the amount of steps is usually going to be equal to the number of unique examples/batch size. In other words, the number of batches, which is why in the provided notebooks, you see that it will say i.e. batches.n or val_batches.n

Hope this helps, and let me know if anything is still unclear! :slight_smile:

Thank you, your explanation is perfect, I understand the meaning of steps, but I wrote it myself generator, I do not seem to use batch_size, but these days I try to use predict_generator to predict, but I do not know how to control my prediction of the amount of data is returned, I coordinate by predicting the presented results but, I can only return 200 cycles, 12 results, I don’t know where I have a problem, can you teach me? I paste the code out

This is generator code:

canshu = ['']
def get_steps(img, widths, heights):
    tm = LSF.type_models[0]
    slide = openslide.open_slide("data/" + tm + "/test/" + img)
    s = slide.dimensions
    s1 = int((s[0]) / widths)
    s2 = int((s[1]) / heights)
    if s[0] % widths != 0:
        s1 = s1+1
    if s[1] % heights != 0:
        s2 = s2+1
    canshu[0] = ([s, s1, s2, widths, heights])
    return s1*s2

coordinatesList = []
def pre_gen(img):
    tm = LSF.type_models[0]
    slide = openslide.open_slide("data/" + tm + "/test/" + img )
    s, s1, s2, widths, heights = canshu[0]
    for i in range(s1):
        for j in range(s2):
            if s[0]-(i*widths) < widths:
                width = s[0]-(i*widths)
                width = widths
            if s[1]-(j*heights) < heights:
                height = s[1]-(j*heights)
                height = heights
        x = (slide.read_region((i * widths, j * heights), 0, (width, height)))
        x = np.resize(x, [150, 150, 3])
        x = np.array(x)[:, :, :3] / 255.0
        x = np.expand_dims(x, axis=0)
        coordinatesList.append([(i * widths)/s[0], (j * heights)/s[0], width/s[0], height/s[0]])
        yield x

This is predict code:

dict_predict = dict(predict_pro='0', predict_probility='0')
prob_dict = {}
def predict_model(img, modelPath):
    del generator.coordinatesList[:]
    widths = 500
    heights = 500
    generator.get_steps(img, widths, heights)
    steps =generator.get_steps(img,widths,heights)
    model = load_model(modelPath)
    prob_value_list = []
    for i in range(200):
        predict_pro = (i + 1) / (float(200))
        predict_probility = (model.predict_generator(generator.pre_gen(img), steps=steps))[0][0]
        prob_value_list.append((model.predict_generator(generator.pre_gen(img), steps=steps))[0])
        prob_dict[str(i)] = prob_value_list[i]

def predictResult():
    predict_coordinate = {}
    for keys, values in prob_dict.items():
        y0 = values[0]
        y1 = values[1]
        if y0 <= y1:
            predict_coordinate[keys] = generator.coordinatesList[int(keys)]
    return predict_coordinate

I do not want to add trouble to you, but I have just learned machine learning, limited capacity, I hope you can help me, thank you

Can you help me?

1 Like