What I mean by “ensembling”:
Training multiple copies of the same architecture and using the average of their predictions.
The following worked for ensembling:
nb_class = 43
preds = []
for i in range(6):
gen = ImageDataGenerator()
batches = gen.flow_from_directory(test_path, shuffle=False, batch_size=batch_size, target_size=target_size, class_mode=None)
model = get_model(nb_class)
model.load_weights(results_path+"default--30-epochs--1484750156-{}.h5".format(i))
pred = model.predict_generator(batches, batches.nb_sample)
preds.append(pred)
The following didn’t work for ensembling:
nb_class = 43
preds = []
gen = ImageDataGenerator()
batches = gen.flow_from_directory(test_path, shuffle=False, batch_size=batch_size, target_size=target_size, class_mode=None)
for i in range(6):
model = get_model(nb_class)
model.load_weights(results_path+"default--30-epochs--1484750156-{}.h5".format(i))
pred = model.predict_generator(batches, batches.nb_sample)
preds.append(pred)
The difference:
In the example that worked, I remade the test batches each time I used them. In other words, the following is in the for-loop instead of outside of it:
gen = ImageDataGenerator()
batches = gen.flow_from_directory(test_path, shuffle=False, batch_size=batch_size, target_size=target_size, class_mode=None)
Why this surprised me:
I used a single generator many times when fitting.
Why this doesn’t surprise me that much:
Generators go away when you use them.
What I mean by “didn’t work”:
The models’ outputs were very different from each other, despite being of the same architecture. Also, each time I used predict_generator
on the same model, its output changed, despite the test set not changing. The not-working ensemble lead to an accuracy of 24%. The working ensemble lead to an accuracy of 98%.