Lesson 11 Memory Error pickle dump

(Jon Lo) #1

Hi there,

I am getting a memory error when I run the following code. I am using Paperspace’s P5000 machine (16GB)

def get_vecs(lang, ft_vecs):
vecd = {w:ft_vecs.get_word_vector(w) for w in ft_vecs.get_words()}
pickle.dump(vecd, open(PATH/f’wiki.{lang}.pkl’,‘wb’))
return vecd
en_vecd = get_vecs(‘en’, en_vecs)
fr_vecd = get_vecs(‘fr’, fr_vecs)

MemoryError Traceback (most recent call last)
in ()
----> 1 en_vecd = get_vecs(‘en’, en_vecs)
2 fr_vecd = get_vecs(‘fr’, fr_vecs)

in get_vecs(lang, ft_vecs)
1 def get_vecs(lang, ft_vecs):
2 vecd = {w:ft_vecs.get_word_vector(w) for w in ft_vecs.get_words()}
----> 3 pickle.dump(vecd, open(PATH/f’wiki.{lang}.pkl’,‘wb’))
4 return vecd


I am not using the english and french sentences as a dataset if it makes any difference. I don’t think it makes a difference though, as if I’m understanding the code correctly, I get a memory error when I pickle dump the fasttext english word vectors, which is almost 9GB in size.

Any help would be appreciated. Do I need to upgrade from 16GB or is there a memory leak somewhere?


(QQQ) #2

Try running the codes line by line for the english vector, but delete en_vecs and do a gabbage collection to free up RAM, then proceed with the pickle.dump.

import fastText as ft
en_vecs = ft.load_model(str((PATH/‘wiki.en.bin’)))
en_vecd = {w:en_vecs.get_word_vector(w) for w in en_vecs.get_words()}
import gc
del en_vecs
f = open(PATH/f’wiki.en.pkl’,‘wb’)

I was stuck for quite some time and was mulling over buying another 16GB of RAM to supplement the existing 16GB. Luckily, this works.