@twairball I just checked my work.
I didn't use the batch normalized version, after all I'm removing the dense layers and the convolutional layers aren't batch normalized afaik.
So I'm just re-training the last two dense layers and changing the output layer to dense 8 with softmax activation as usual.
However, I noticed something out of usual that I had done ..
Insipired by this:
.. It seems that I decided to test Adadelta.
However, according to Kaggle, my second best is 1.07, and I'm pretty sure that I didn't use Adadelta back then. Unfortunately I have only lately understood Jeremy's words of reproducibility and I don't recall what methods I used back then (12 days ago).