Hi all,
I have a few days on my hands and I was thinking to apply some of the techniques learned in the course to extract keywords from documents and/or to cluster documents and possibly write a blog post about it.
I retrieved a few years of Ask Ubuntu questions (user + title + body + keywords) and I wanted to answer one or more of the following questions (ideally the first two and perhaps then understand how to go about the next ones):
- Can I extract keywords using title and/or body?
- Can I cluster together question based on content?
- Can I cluster users based on the type of questions they ask?
- Can I recommend questions to users to answer based on the questions they previously asked or answered?
Has anyone any experience with this and can perhaps share his working pipeline?
I wanted to start with TF-IDF plus classifier as benchmark, then apply deep learning to see if I can improve. I was thinking perhaps a LSTM architecture could work for keyword extraction and was curious about using Mean-Shift clustering on the TF-IDF matrix, but perhaps some of you have working experience or better ideas?
Thanks!