Help with Vietnamese NLP


I am trying to rerun the Vietnamese notebook and am getting the file not found error at


This seems to be the case with any language. A manual check revealed that the text directory did not have an AA\wiki_00.

I don’t know what the problem here is.

Can you post the full stack trace from the error (surrounded by ```), its hard to understand

From looking at the get_wiki function in nlputils it looks like the below line is probably triggering the error:

shutil.move(str(path/'text/AA/wiki_00'), str(path/name))

Check where the function downloaded your file and maybe modify get_wiki to point to that place…

Thank you! That line throws out the error. FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\\.fastai\data\viwiki\text\AA\wiki_00.

Wikipedia extractor does not work. As in I simply don’t see any files. I tried setting a particular output file using -o and the file was empty.

This is the part within the get_wiki function I think is going wrong
with working_directory(path):
if not (path/‘wikiextractor’).exists(): os.system(‘git clone’)
os.system(“python wikiextractor/ -o - --debug --processes 4 --no_templates " +
f”–min_text_length 1800 --filter_disambig_pages --log_file log -b 100G -q {xml_fn}")
shutil.move(str(path/‘text/AA/wiki_00’), str(path/name))
And I added an -o to control where it outputs and the file turned out to be empty.

I pulled the os.system statement outside to check if it’s doing it’s job and on my notebook, it simply printed 2.

No idea why it does that.

‘get_wiki’ seems to work for me, although working_directory was throwing an error so I just copied what it does explicitly:

#with working_directory(path):
prev_cwd = Path.cwd()
if not (path/'wikiextractor').exists(): os.system('git clone')
os.system("python wikiextractor/ --processes 4 --no_templates " +
    f"--min_text_length 1800 --filter_disambig_pages --log_file log -b 100G -q {xml_fn}")