In the create_emd function, we divide emb by 3. I don’t recall the reasoning behind this – why is that?
n_fact = vecs.shape
emb = np.zeros((vocab_size, n_fact))
for i in range(1,len(emb)):
word = idx2word[i]
if word and re.match(r"^[a-zA-Z0-9\-]*$", word):
src_idx = wordidx[word]
emb[i] = vecs[src_idx]
# If we can't find the word in glove, randomly initialize
emb[i] = normal(scale=0.6, size=(n_fact,))
# This is our "rare word" id - we want to randomly initialize
emb[-1] = normal(scale=0.6, size=(n_fact,))
That, and the 0.6 scale, are horrible hacks. I found that the stddev of the glove vectors was about 0.6, so I used that for my randomly generated vectors. Then I divided them all by 3 to get similar weights to what glorot initialization would provide, IIRC. I can’t quite remember the details however - I just threw it together one night and forgot to actually document what I’m doing. You shouldn’t assume I did a good job of this, so feel free to play around with different scales.
(Although today I’ll show you a better way to handle this!)
Turns out, not dividing at all doesn’t change the performance of the models by anything noticeable (as far as I’ve tested so far)