Doc2Vec post


(Gidi Shperber) #1

Hi, I wrote a post about doc2vec - a nice technique for matching documents, extracting tags, etc.
I hope you’ll like it!


(alex) #2

Great explanation!

Also super interesting to hear about the work you’re doing for ScaleAbout :slight_smile:

3 questions:

  1. Did you try simply averaging the word2vec vectors of the document’s words to represent that document instead of using doc2vec?

  2. Have you looked at any RNN models to represent the content of a document?

  3. Have you tried LDA-2-Vec?


(Gidi Shperber) #3

Hi Alex, I’m really happy you liked my post!
re you questions:

  1. averaging all words vectors will be too noisy, since the tests are around 300 long… however, we do get good results from a 2 step model - extracting keywords and averaging their prettained word2vec.
  2. I did try to train Rnn on our tagged docs (same as what Jeremy did in lesson 5) but results were worse than cnn
  3. I did not hear about this technique. I did try using lda but didvnt get very good results.

#4

Hi @markovbling, I was wondering whether you had used LDA-2-Vec, and if so, what your experiences were! Cheers