I thought the following were intereresting, the first two along the lines of my thinking.

The Hybrid Bootstrap: A Drop-in Replacement for Dropout

Kosar, Robert

Scott, David W.

The hybrid bootstrap is a regularization technique that functions similarly to dropout except that features are resampled from other training points rather than replaced with zeros.

http://arxiv.org/abs/1801.07316

Excitation Dropout: Encouraging Plasticity in Deep Neural Networks

Zunino, Andrea

Bargal, Sarah Adel

Morerio, Pietro

Zhang, Jianming

Sclaroff, Stan

Murino, Vittorio

In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout

http://arxiv.org/abs/1805.09092

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Mallya, Arun

Davis, Dillon

Lazebnik, Svetlana

By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task.

http://arxiv.org/abs/1801.06519

An interesting paper on tips for combining dropout with batchnorm:

https://arxiv.org/abs/1801.05134

I’ll keep digging for any more non-randonm dropout/learned regularization strategies