Hi originof hope all is well and you are having a wonderful day!!
This has to be the best paper/news I have had for the past year, any thing that will help reduce the need for GPU’s is fantastic in my opinion.
Thanks for posting this. In case anyone is searching for the papers…
As I understand it after a first reading, for a single sample they calculate the activations only for the few units that tend to activate together (using a hash table). The same savings apply to backpropagation and weight update. Furthermore, several samples can safely be processed in parallel.
I wonder whether SLIDE applies to visual models that use Conv? The authors show examples only of recommendation systems and NLP.
Hi Pomo Hope you’re having a fun weekend. Thanks for the papers and summary.
I can only assume it makes use of AVX-512 or otherwise why would Intel want anything to do with it. Also Pointe Vecchio is coming, don’t want to obsolete that. Lots of caveats I’m sure. I’d like to see where this goes in 1-2 years. I feel like somewhere in the multi billion USD AI/ML market someone had thought of algorithm improvements.