Visualize Embedding layers

akrillo · October 11, 2019, 12:55pm

Hi everyone,

I try to get a feeling for embedding layers and want to visualise an embedding layer. I took my pretrained model and mapped the vocab to the weights of the embedding layer.

    learn = language_model_learner(lm_data, AWD_LSTM, drop_mult=0.3, pretrained=False)
    learn.load_pretrained(wgts_fname=get_extract_path(uuid) + "/models/lm.p.pth", itos_fname=get_extract_path(uuid) + "/lm_itos.p")

    print(lm_data.vocab.itos)
    print(learn.model[0].encoder)

    pca = PCA(n_components=2)
    pca.fit(learn.model[0].encoder.weight.data)
    embedding_weights = pca.transform(learn.model[0].encoder.weight.data)

    embeddings = {}
    for i, word in enumerate(lm_data.vocab.itos):
        weight_list = embedding_weights[i].tolist()
        embeddings[word] = {'x': (weight_list[0]), 'y': (weight_list[1]) }

    return JSONResponse(embeddings)

If I check the 2d visualisation it doesn’t look like that synonyms are close enough to each other. I am asking myself if there is something wrong with my index mapping. Did you experience similar issues or do you see an error?

I would like to create a 2d / 3d visualisation training a language model.