Hi guys,
I’ve got a dataset of about 100’000 articles totaling around 532MB.
I’m using a TextList to load it on a P100 with 24GB RAM on Paperspace.com.
Loading more than 50’000 articles results in a Memory Error:
Surely 24GB RAM should be enough for a dataset of 530MB, right?
The largest article is about 128kb whereas the average is 5,3kb.
from fastai.text import *
path = "data/sv-wiki-articles-100k"
bs = 64
data_lm = (TextList.from_folder(path + '/', extensions=['.txt'])
.use_partial_data(sample_pct=0.5)
.split_by_rand_pct(0.1)
.label_for_lm()
.databunch(bs=bs))
I’ve tried to systematically decreasing the batch size all the way down to 2, which doesn’t change the outcome.
Do you have any suggestions?
The error:
MemoryError Traceback (most recent call last)
<ipython-input-7-3c6a208eb876> in <module>
1 data_lm = (TextList.from_folder(path + '/', extensions=['.txt'])
2 .use_partial_data(sample_pct=1.0)
-> 3 .split_by_rand_pct(0.1)
4 .label_for_lm()
5 .databunch(bs=bs))
...
/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/core.py in array(a, dtype, **kwargs)
271 if np.int_==np.int32 and dtype is None and is_listy(a) and len(a) and isinstance(a[0],int):
272 dtype=np.int64
-> 273 return np.array(a, dtype=dtype, **kwargs)
274
275 class EmptyLabel(ItemBase):
MemoryError:
Thanks for your help