Using TruncatedSVD in PySpark

Hi, I’m having a large list of Book-Titles / User-IDs and trying to use TruncatedSVD for dimensionality reduction. I’m following these steps:

  1. Define the feature columns.
  2. Assemble the features into a vector column.
  3. Apply TruncatedSVD.

But it looks like PySpark doesn’t have TruncatedSVD built-in, and using a similar approach with PCA would give OutOfMemoryError with a T4 GPU.

While this approach works well without using PySpark, I was wondering if anyone has run into this issue before and how they handled dimensionality reduction in PySpark.

Any tips or references would be really helpful. Thanks a lot!