CPU algorithm trains deep neural nets up to 15 times faster than top GPU

Thanks for posting this. In case anyone is searching for the papers…

SLIDE

Accelerating SLIDE

As I understand it after a first reading, for a single sample they calculate the activations only for the few units that tend to activate together (using a hash table). The same savings apply to backpropagation and weight update. Furthermore, several samples can safely be processed in parallel.

I wonder whether SLIDE applies to visual models that use Conv? The authors show examples only of recommendation systems and NLP.

:slightly_smiling_face:

2 Likes