Why is PCA applied on Transposed data


(Darshan) #1

The sklearn API mentions that the data input format to PCA is (n_samples, n_features). My understanding is we have 2000 samples (number of movies) and 50 features (embedding length). The goal here is to reduce the 50 dimension to 3 dimension. Thus we should use the data as is without Transposing.

Am I missing something in terms of definition of feature vs sample here? @jeremy