Hi,

I am going through the lesson3 notebook as well as the note, and I am noticing that the notebook rescales the weights when it changes (or removes) dropout. However, as the notes say:

Everything in the lesson still applies and the rescaling of weights is still 100% accurate should we be applying classical dropout but through the inner workings of keras this step can be disregarded (

if you do the rescaling you will end up with weights that are either too small or to large!)

Clearly this means that the weights in the actual notebook end up being too small or too large, and yet the network seems to still learn ok (perhaps not as well as it would have otherwise, but still). Is there an intuition behind the idea that rescaling the weights arbitrarily doesnâ€™t do that much damage? (perhaps it has something to do with batchnorm somehow?)

Thanks!