Adaptive Deep Reuse - cutting training time?

Staged to cut training time by exploiting feature similarities in training data as well as activations.
Has anyone seen more detail on it than purported here?
Would be interesting to read the paper, when it comes out.