Hey,
data_lm = (TextList.from_folder(path)
#Inputs: all the text files in path
.filter_by_folder(include=['train', 'test', 'unsup'])
#We may have other temp folders that contain text files so we only keep what's in train and test
.split_by_rand_pct(0.1)
#We randomly split and keep 10% (10,000 reviews) for validation
.label_for_lm()
#We want to do a language model so we label accordingly
.databunch(bs=bs))
data_lm.save('data_lm.pkl')
I’m getting a Unicode error: UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x8d in position 1896: character maps to <undefined>
I tried: export LANG=en_US.UTF-8 export LC_ALL=en_US.UTF-8
but no succsess.
Please help
Olaf