Arxiv Document (Not Word) Embeddings - Advice?

@jeremy you showed us in class how you generated word embeddings by trying to predict the next word in class.

What are your thoughts on trying to generate the title and category from the content of the paper rather than trying to predict the next word? I’ve read about similar approaches on generating news headlines from news articles. I’m thinking that this might be a good way to create embeddings for each document, by using a hidden layer in the network that is involved with predicting a title or summary. My goal is to find documents that are similar by first computing a document-level embedding (or a latent representation of each document). My questions are:

  1. Will you be going over this in class? If not, do you have any favorite papers or materials you recommend for this?
  2. Do you recommend I try a sequence to sequence model for this (encoder -> decoder)? Or is there a different approach that is better?
  3. Are there any other approaches that you like for creating document level embeddings? I have some friends that just that multiply TFIDF by the word vectors in the document and average all the vectors, this seems like it would be less powerful than the approach where you try to predict the title or headline of an article.