Is it good idea to cluster on word/document embeddings?

As word (document) embeddings are created to project the word/document vectors into some n-dimensional space, is it a good/valid idea to treat each dimension of embedding as a separate feature and apply clustering techniques like k-means etc?

To keep it short, yes it does make sense to do clustering in this space (I would however recommend using something like u-map or PCA (depending on your goals) to preprocess the dimensions before clustering as most techniques do not function well in high dimension).

I believe you didn’t find many hits because, unlike word embedding, document embedding is still far from mature and techniques are still evolving quickly. Thus we are not yet at a stage were people combine standard techniques (as there is no standard).

@nestorDemeure, Thank you for the reply. Do you have any blog/reference that can point to latest research work in document embedding?

I have no particular ressource in mind but here is a, non-exhaustive, list of methods to do sentence embedding (so not exactly document embedding but it often translate from one to the other) in chronological order : awesome-sentence-embedding#encoders