Thanks for posting this. In case anyone is searching for the papers…
SLIDE
Accelerating SLIDE
As I understand it after a first reading, for a single sample they calculate the activations only for the few units that tend to activate together (using a hash table). The same savings apply to backpropagation and weight update. Furthermore, several samples can safely be processed in parallel.
I wonder whether SLIDE applies to visual models that use Conv? The authors show examples only of recommendation systems and NLP.
