I do not claim to understand all the math behind, but the intuition behind Dropout is quite interesting. You can think of it as an MC sampling process.
Thought posting it here will be more approximate though I have posted in time series thread before.