Hi, I’m having a large list of Book-Titles
/ User-IDs
and trying to use TruncatedSVD for dimensionality reduction. I’m following these steps:
- Define the feature columns.
- Assemble the features into a vector column.
- Apply TruncatedSVD.
But it looks like PySpark doesn’t have TruncatedSVD built-in, and using a similar approach with PCA would give OutOfMemoryError
with a T4 GPU.
While this approach works well without using PySpark, I was wondering if anyone has run into this issue before and how they handled dimensionality reduction in PySpark.
Any tips or references would be really helpful. Thanks a lot!