Part 2 Lesson 10 wiki

What is the idea behind averaging the weights of embeddings? I didn’t get, why are we doing that?

6 Likes

I think it tries to assign semantic value to the number in the text. For instance, “1984” needs to convey the concept of “year”. So, the embedding/word vectors of these two tokens would be close in this embedded space.

While building the embedding matrix, if you see an unknown item, you need to initialize it with something, right? He’s choosing to use the mean so that it’s easier to tune from that point on.

1 Like

http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking

1 Like

Would love to check it out, but I haven’t ‘written’ anything in hindi for over 10 years and I am a native speaker. Might also be useful to do a speech recogniser.

1 Like

or you can define directly word embeddings in your Field object of torchtext

Its kind of making your words into numbers so that words by themselves are related to other words

Is there specific advantage of creating our own pre-trained embedding (using something like wikitext103) over using glove or fasttext or word2vec? Becuase those embeddings were probably trained on a much larger corpus of data?

6 Likes

Anyone else happy we got rid of torchtext? Gawd, I hated that thing!

3 Likes

jeremy discusses how those corpuses were created in Part 1 v1 some lesson.

Like, why mean and why not random initialization. Is it because our pretrained model is from wiki? And it generalizes better?

4 Likes

I guess goes to ‘unk’

I think this is an experiment, and probably not the best representative idea. There are a number of possibilities, e.g. using some other distribution.

It is just one way of doing that. Tomas Mikolov (Word2vec author) proposed finding the variance of the word embeddings and sampling from a uniform distribution.

1 Like

Probably so that we do not have to write different version of get_all/get_texts functions.

1 Like

How do you match that? Dictionary?

WikiText-103 wget cell may download to the wrong directory, you need to specify the prefix:

! wget -nH -r -np -P {PATH} http://files.fast.ai/models/wt103/
5 Likes

Yes. The quality of the results you get from word2vec or FastText just depend on how well they each learn from the data. How representative e.g. the skipgram modeling is, compared to neural pretraining.

Edit: However, like Jeremy said, word2vec and other word embedding methods aren’t as rich as pretrained neural networks.

From what I understand, the wikitext103 is a language model that uses an RNN architecture. The code is actually in lm_rnn.py in fastai The embeddings (fasttext, word2vec, GloVe) are trained from a large corpus, but it’s just weights and not a language model. They are usually used in the first layer of a NN and we still need to train the model to make transfer learning possible.

Correct me if I’m wrong!

9 Likes

whoa asciidoctor makes documentation look so good