Are all matrix decomposition slow or is it just SVD?

Running SVD on the vectors of size (2034, 26576) took somewhere between 20 to 25 seconds almost every time it ran. Is that the order of latency you would expect from all exact matrix decompositions? No wonder we need to see if we can parallelize it.

Yes, I think most exact matrix decompositions are slow (I’m always hesitant to make a statement about “all”, although I can’t think of any counter-examples), and that’s why techniques such as approximate & randomized algorithms and parallelization are so important.