I had hard time thinking whether ReLU activation also helps in regularization similar to dropout. Since ReLU sets any values less than 0 to 0, so this should be similar to dropping that activation output contribution to successive layers. Assuming that the input and weight distribution is roughly unit variance and zero mean (independent distributions) then roughly 50% of the activation should be zero. Also, in order to have regularization effect similar to drop out, the same activation units should not gets switched off for every input but should be randomized. I think this can happen with well augmented datasets.
Is my understanding correct?