I have three models that I built out for the dogs vs cats redux challenge and my losses are 0.11, 0.105, and 0.3. I want to allow each of these to vote on whether the object is a cat or a dog but what’s the best way to do this (I think this is what an ensemble is?)? My first attempt was to just add all 3 predictions up and then divide by three, but that really didn’t help. My next idea is to use cyclical annealing on the two models that are around 0.15 and allow each of them like 5 votes and throw the other model out completely. I would really appreciate some feedback from somebody that has done more with ensembles and if there are any good links to show techniques for using multiple models I would appreciate that too!
I was able to get my original idea working. I had to make a few changes. First I was using:
tst_batch = idg_test.flow_from_directory(path+"test/", target_size=(224,224), batch_size=batch_size, shuffle=False)
When I was predicting using this it wasn’t working:
predictions1 = model1.predict_generator(tst_batch, steps=(tst_batch.n/batch_size))
predictions2 = model2.predict_generator(tst_batch, steps=(tst_batch.n/batch_size))
predictions3 = model3.predict_generator(tst_batch, steps=(tst_batch.n/batch_size))
I converted that into three separate tst_batch variables and after that I was able to take predictions2 and predictions3 and average them together and get my 0.11 and 0.105 to a 0.09009. I’m going to try adding predictions1 back in which is the one that was worse to see if that improves my two ensemble model. Hopefully this can help somebody else otherwise it at least helps me to write it out.
This article might be useful if you are working around blending models https://mlwave.com/kaggle-ensembling-guide/
The article already recommended is very good to understand the basics of ensembling. About your particular case, I would point also:
Three models is not a big number of models. With less than, say, 4-5 models stacking is discarded and blending is more likely to give results. Usually some weighed average will work best. Also an easy try is geometric mean that sometimes outperforms other averaging methods.
Lastly, don’t set your expectations on ensembling too high. It does improve your models very frequently but only by a small amount, maybe useful in a competition where last decimals matter but you are unlikely to get something much different than your best models performance.
Ok so it probably won’t double you accuracy it might get it a bit better if there are a few that get flipped or something. Very good to know. I was definitely expecting I could squeeze more out of it. So I will take a step back and work on improving my current models, then during my very last step I will use the averaging ensemble technique. Thanks for the insight. Definitely good to lower my expectations.