@Kradoc, going back to the "why does Tensorflow perform worse than Theano" for the same lesson1 notebook question -- I suspect this goes back to the dimension ordering. Although I did the quicky kludge workaround by specifying 'th' dimension ordering in the keras.json file for use with Tensorflow, I am finding places where arguments passed are dependent on this ordering.
For example, in the BatchNormalization API, the axis parameter specifies the axis on which to normalize. In a Tensorflow example, it would be (samples, rows, cols, channels) while in Theano it would be (samples, channels, rows, cols). You would want to normalize on channels, so for tensorflow you would pass axis=3, while for theano you would pass axis=1. I don't know what Keras does under the hood if you have 'th' ordering on a 'tf' back end.
I haven't had a chance to go back through the lesson1 stuff (I am working on lesson 3) but I suspect something similar may be going on there (I don't think lesson 1 uses batch normalization specifically, I don't remember for sure).