Text embeddings using neural networks

How are text embeddings generated using neural networks?

– This is how I think, but I’m not sure how we would get the training data.
For a given word if we want a 50 representative vector, the final layer would have 50 nodes. Is this correct?

Moreover in the paper, Devise: A deep visual-semantic embedding model, in the section, language model pre-training section it is mentioned that, “Our skip-gram model used a hierarchical softmax layer for predicting adjacent terms and was trained using a 20-word window with a single pass through the corpus.” So how would predicting the adjacent words give us these embeddings? The embeddings that we get, are those the corresponding weights?

http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

If you have an embedding that can predict surrounding words (skipgram), it should be a good representation.

An alternative way of doing this is to look at the surrounding words and try to predict the target word (CBOW).

We learnt this in part 1 - might be worth revising that section before next week’s class, and let us know if you have any questions about that material.

I thought i understood this conceptually and tried to test my understanding before working on the devise paper.

Embeddings are representative vectors in space for any categorical value. In the movielens example, the latent information about the type of the movie, popularity, genre, etc was inferred from the ratings itself. Extending that logic… if I have flight data between multiple cities, or distance data, I should be able to get 2D co-ordinates (ignoring translation, rotation, etc) of all the cities. The PCA or the t-SNE of these should resemble a map of united states … atleast roughly. At least that is what my intuition says.

So I went about downloading a months worth of data and tried fitting a neural network with airports as embeddings. I tried visualizing the results in tensorboard, but could not get anything sensible. I tried both PCA and t-SNE. My intuition says, PCA alone should work and we need not go to non-linear space, but both of them fail. @jeremy Could you please look at my gist and help: airport-embedding.py

We’ll be looking at nearly this exact example in lesson 14. :slight_smile:

I made one change to the model by using the same embedding of dimension 3 for both the origin and destination airports and with that I was able to get at least the west coast airports line up. Even Hawaii airports are in the map.

Notice that LAX is south of San diego, but this makes sense because I used flight time (& not distance) and LAX is a busy airport and it might take longer to land/take off from there.

I was expecting to see the whole of the US map, but could only manage to get this. :disappointed_relieved:

2 Likes

Hope to hear more about this in class today @jeremy