Part 2 Lesson 10 wiki

SHAR1 · April 3, 2018, 2:32am

What is the idea behind averaging the weights of embeddings? I didn’t get, why are we doing that?

snagpaul · April 3, 2018, 2:33am

I think it tries to assign semantic value to the number in the text. For instance, “1984” needs to convey the concept of “year”. So, the embedding/word vectors of these two tokens would be close in this embedded space.

narvind2003 · April 3, 2018, 2:35am

While building the embedding matrix, if you see an unknown item, you need to initialize it with something, right? He’s choosing to use the mean so that it’s easier to tune from that point on.

suvash · April 3, 2018, 2:35am

http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking

snagpaul · April 3, 2018, 2:35am

Would love to check it out, but I haven’t ‘written’ anything in hindi for over 10 years and I am a native speaker. Might also be useful to do a speech recogniser.

imtiazdl · April 3, 2018, 2:35am

or you can define directly word embeddings in your Field object of torchtext

imtiazdl · April 3, 2018, 2:37am

Its kind of making your words into numbers so that words by themselves are related to other words

abi · April 3, 2018, 2:37am

Is there specific advantage of creating our own pre-trained embedding (using something like wikitext103) over using glove or fasttext or word2vec? Becuase those embeddings were probably trained on a much larger corpus of data?

narvind2003 · April 3, 2018, 2:37am

Anyone else happy we got rid of torchtext? Gawd, I hated that thing!

Interogativ · April 3, 2018, 2:38am

jeremy discusses how those corpuses were created in Part 1 v1 some lesson.

SHAR1 · April 3, 2018, 2:38am

Like, why mean and why not random initialization. Is it because our pretrained model is from wiki? And it generalizes better?

imtiazdl · April 3, 2018, 2:39am

I guess goes to ‘unk’

poppingtonic · April 3, 2018, 2:40am

I think this is an experiment, and probably not the best representative idea. There are a number of possibilities, e.g. using some other distribution.

rudraksh · April 3, 2018, 2:42am

It is just one way of doing that. Tomas Mikolov (Word2vec author) proposed finding the variance of the word embeddings and sampling from a uniform distribution.

hiromi · April 3, 2018, 2:42am

Probably so that we do not have to write different version of get_all/get_texts functions.

YangL · April 3, 2018, 2:43am

How do you match that? Dictionary?

emilmelnikov · April 3, 2018, 2:43am

WikiText-103 wget cell may download to the wrong directory, you need to specify the prefix:

! wget -nH -r -np -P {PATH} http://files.fast.ai/models/wt103/

poppingtonic · April 3, 2018, 2:44am

Yes. The quality of the results you get from word2vec or FastText just depend on how well they each learn from the data. How representative e.g. the skipgram modeling is, compared to neural pretraining.

Edit: However, like Jeremy said, word2vec and other word embedding methods aren’t as rich as pretrained neural networks.

shoof · April 3, 2018, 2:45am

From what I understand, the wikitext103 is a language model that uses an RNN architecture. The code is actually in lm_rnn.py in fastai The embeddings (fasttext, word2vec, GloVe) are trained from a large corpus, but it’s just weights and not a language model. They are usually used in the first layer of a NN and we still need to train the model to make transfer learning possible.

Correct me if I’m wrong!

creviera · April 3, 2018, 2:49am

whoa asciidoctor makes documentation look so good