Ensemble + ImageDataGenerator caveat

What I mean by “ensembling”:

Training multiple copies of the same architecture and using the average of their predictions.

The following worked for ensembling:

nb_class = 43
preds = []
for i in range(6):
    gen = ImageDataGenerator()
    batches = gen.flow_from_directory(test_path, shuffle=False, batch_size=batch_size, target_size=target_size, class_mode=None)
    model = get_model(nb_class)
    model.load_weights(results_path+"default--30-epochs--1484750156-{}.h5".format(i))
    pred = model.predict_generator(batches, batches.nb_sample)
    preds.append(pred)

The following didn’t work for ensembling:

nb_class = 43
preds = []
gen = ImageDataGenerator()
batches = gen.flow_from_directory(test_path, shuffle=False, batch_size=batch_size, target_size=target_size, class_mode=None)
for i in range(6):
    model = get_model(nb_class)
    model.load_weights(results_path+"default--30-epochs--1484750156-{}.h5".format(i))
    pred = model.predict_generator(batches, batches.nb_sample)
    preds.append(pred)

The difference:

In the example that worked, I remade the test batches each time I used them. In other words, the following is in the for-loop instead of outside of it:

gen = ImageDataGenerator()
batches = gen.flow_from_directory(test_path, shuffle=False, batch_size=batch_size, target_size=target_size, class_mode=None)

Why this surprised me:

I used a single generator many times when fitting.

Why this doesn’t surprise me that much:

Generators go away when you use them.

What I mean by “didn’t work”:

The models’ outputs were very different from each other, despite being of the same architecture. Also, each time I used predict_generator on the same model, its output changed, despite the test set not changing. The not-working ensemble lead to an accuracy of 24%. The working ensemble lead to an accuracy of 98%.

I’m guessing that you need to recreate the test set batches since otherwise it may not reset correctly. You can probably put the ‘gen =’ line outside the loop, however.