I thought the following were intereresting, the first two along the lines of my thinking.
The Hybrid Bootstrap: A Drop-in Replacement for Dropout
Scott, David W.
The hybrid bootstrap is a regularization technique that functions similarly to dropout except that features are resampled from other training points rather than replaced with zeros.
Excitation Dropout: Encouraging Plasticity in Deep Neural Networks
Bargal, Sarah Adel
In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task.
An interesting paper on tips for combining dropout with batchnorm:
I’ll keep digging for any more non-randonm dropout/learned regularization strategies