Hi everyone,
The py notebook wordvectors is used to process pretrained weights from glove. And the following codes were used.
def get_glove(name):
with open(path+ 'glove.' + name + '.txt', 'r', encoding='utf8') as f: lines = [line.split() for line in f]
words = [d[0] for d in lines]
vecs = np.stack(np.array(d[1:], dtype=np.float32) for d in lines)
wordidx = {o:i for i,o in enumerate(words)}
save_array(res_path+name+'.dat', vecs)
pickle.dump(words, open(res_path+name+'_words.pkl','wb'))
pickle.dump(wordidx, open(res_path+name+'_idx.pkl','wb'))
When I tried to process very large pretrained files (e.g. glove.42B.300d), I run into RAM limitation since I only have 16gb. So is there a way to modify the code such that the system reads the pretrained file line by line, instead of loading everything into RAM at once?
Thank you.