Doc2Vec post

shgidi · July 26, 2017, 7:55am

Hi, I wrote a post about doc2vec - a nice technique for matching documents, extracting tags, etc.
I hope you’ll like it!

markovbling · July 26, 2017, 11:36am

Great explanation!

Also super interesting to hear about the work you’re doing for ScaleAbout

3 questions:

Did you try simply averaging the word2vec vectors of the document’s words to represent that document instead of using doc2vec?
Have you looked at any RNN models to represent the content of a document?
Have you tried LDA-2-Vec?

shgidi · July 26, 2017, 1:59pm

Hi Alex, I’m really happy you liked my post!
re you questions:

averaging all words vectors will be too noisy, since the tests are around 300 long… however, we do get good results from a 2 step model - extracting keywords and averaging their prettained word2vec.
I did try to train Rnn on our tagged docs (same as what Jeremy did in lesson 5) but results were worse than cnn
I did not hear about this technique. I did try using lda but didvnt get very good results.

msp · May 2, 2018, 2:24pm

Hi @markovbling, I was wondering whether you had used LDA-2-Vec, and if so, what your experiences were! Cheers