NB 10 Error can't open 'tmp/texts.out'

RogerS49 · March 20, 2020, 7:48am

I must explain my jump ahead and I know you are all very busy, and my bug maybe the result of my impatience but I was hoping to use some of this on the covid -19 Kaggle challenge.

This relates to not setting a directory somewhere

I have installed both fastcore and fastai2 with -e ".[dev]" this morning since I found the fault but still persists.

/home/dl/fastai-2020/fastai2/fastai2/text/core.py(354)train()
352 f"–character_coverage={self.char_coverage} --model_type={self.model_type}",
353 f"–unk_id={len(spec_tokens)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
–> 354 f"–user_defined_symbols={’,’.join(spec_tokens)}"]))
355 raw_text_path.unlink()
356 return self.cache_dir/‘spm.model’

ipdb> a
self = <fastai2.text.core.SentencePieceTokenizer object at 0x7f20061ac210>
raw_text_path = Path(‘tmp/texts.out’)
ipdb> ll
344 def train(self, raw_text_path):
345 “Train a sentencepiece tokenizer on texts and save it in path/tmp_dir"
346 from sentencepiece import SentencePieceTrainer
347 vocab_sz = self._get_vocab_sz(raw_text_path) if self.vocab_sz is None else self.vocab_sz
348 spec_tokens = [’\u2581’+s for s in self.special_toks]
349 q = '”’
350 SentencePieceTrainer.Train(" “.join([
351 f”–input={q}{raw_text_path}{q} --vocab_size={vocab_sz} --model_prefix={q}{self.cache_dir/‘spm’}{q}",
352 f"–character_coverage={self.char_coverage} --model_type={self.model_type}",
353 f"–unk_id={len(spec_tokens)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
–> 354 f"–user_defined_symbols={’,’.join(spec_tokens)}"]))
355 raw_text_path.unlink()
356 return self.cache_dir/‘spm.model’
357

ipdb> c

OSError Traceback (most recent call last)
in
1 get_ipython().run_line_magic(‘debug’, ‘’)
----> 2 subword(1000)

in subword(sz)
1 def subword(sz):
2 sp = SubwordTokenizer(vocab_sz=sz)
----> 3 sp.setup(txts)
4 return ’ '.join(first(sp([txt]))[:40])

~/fastai-2020/fastai2/fastai2/text/core.py in setup(self, items, rules)
364 for t in progress_bar(maps(*rules, items), total=len(items), leave=False):
365 f.write(f’{t}\n’)
–> 366 sp_model = self.train(raw_text_path)
367 self.tok = SentencePieceProcessor()
368 self.tok.Load(str(sp_model))

~/fastai-2020/fastai2/fastai2/text/core.py in train(self, raw_text_path)
352 f"–character_coverage={self.char_coverage} --model_type={self.model_type}",
353 f"–unk_id={len(spec_tokens)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
–> 354 f"–user_defined_symbols={’,’.join(spec_tokens)}"]))
355 raw_text_path.unlink()
356 return self.cache_dir/‘spm.model’

OSError: Not found: ““tmp/texts.out””: No such file or directory Error #2

And occurs after or during this call.

sentencepiece_trainer.cc(116) LOG(INFO) Running command: --input=“tmp/texts.out” --vocab_size=1000 --model_prefix=“tmp/spm” --character_coverage=0.99999 --model_type=unigram --unk_id=9 --pad_id=-1 --bos_id=-1 --eos_id=-1 --user_defined_symbols=▁xxunk,▁xxpad,▁xxbos,▁xxeos,▁xxfld,▁xxrep,▁xxwrep,▁xxup,▁xxmaj
sentencepiece_trainer.cc(49) LOG(INFO) Starts training with :
TrainerSpec {
input: “tmp/texts.out”
input_format:
model_prefix: “tmp/spm”
model_type: UNIGRAM
vocab_size: 1000
self_test_sample_size: 0
character_coverage: 0.99999
input_sentence_size: 0
shuffle_input_sentence: 1
seed_sentencepiece_size: 1000000
shrinking_factor: 0.75
max_sentence_length: 4192
num_threads: 16
num_sub_iterations: 2
max_sentencepiece_length: 16
split_by_unicode_script: 1
split_by_number: 1
split_by_whitespace: 1
treat_whitespace_as_suffix: 0
user_defined_symbols: ▁xxunk
user_defined_symbols: ▁xxpad
user_defined_symbols: ▁xxbos
user_defined_symbols: ▁xxeos
user_defined_symbols: ▁xxfld
user_defined_symbols: ▁xxrep
user_defined_symbols: ▁xxwrep
user_defined_symbols: ▁xxup
user_defined_symbols: ▁xxmaj
hard_vocab_limit: 1
use_all_vocab: 0
unk_id: 9
bos_id: -1
eos_id: -1
pad_id: -1
unk_piece:
bos_piece:
~~eos_piece:~~
pad_piece:
unk_surface: ⁇
}
NormalizerSpec {
name: nmt_nfkc
add_dummy_prefix: 1
remove_extra_whitespaces: 1
escape_whitespaces: 1
normalization_rule_tsv:
}

trainer_interface.cc(267) LOG(INFO) Loading corpus: “tmp/texts.out”

thanks

marii · March 20, 2020, 10:08am

Would it be possible an example of your code, but not through the debugger? I am having a bit of a hard time figuring out what is going on based solely on the debugger output.

jeremy · March 20, 2020, 1:52pm

That’s no problem, but please remember to use the #part1-v4:non-beginner category. I’ll move this now.

RogerS49 · March 21, 2020, 6:53am

Take the first 11 cells of notebook 10_nlp.ipynb of course-v4 for the overview. The error occurs in the cell with subword(1000) call.

All I am doing here is running notebook 10, 11, and 12 to do with nlp just to make sure it’s workable.

I added tmp in code cell 2 but changing the code won’t work if perhaps that directory does not exists on the hardware. But were to place it if that is a solution.

RogerS49 · March 21, 2020, 8:14am

Also I have noted in the course-v4/nbs the tmp/texts.out directory and file exists and can be opened with native terminal commands less.

In fact

is most likely suspect because.

The file texts.out has previous been opened to get the size.

The Trainer then tries to open using line 351 where the directory is surrounded by double quotes "tmp/texts.out" and fails because of the quotes. Perhaps.

Tested this and it works .
ALSO THE SAME AT THE END OF LINE 351 REMOVE THE {q} surrounding spm

learner4life · May 20, 2020, 12:18am

Hi @RogerS49 I am getting the same error when running notebook 10.

RuntimeError: Permission denied: ““tmp/spm”.model”: No such file or directory Error #2

How exactly can I resolve this?

RogerS49 · May 20, 2020, 7:47am

Your error is different than mine

What I did was to get the latest versions as the first step. I was using the ‘dev’ as shown below. Thats is I did a git pull of fastai2 and fastcore and pip remove 2 and core and installed as shown below

The analysis of my problem is summed up in the following quotes where the line numbers 350 to 354 represent one python command. The line number 347 shows that the path was accessed successfully before line 350 so it seemed logical that that line was incorrect somehow.

What I suggest you do is get the latest version installed in the manner your system is set up and if the error persists use the %debug statement and enter commands as in the initial post to get more detailed information as there is not enough to go on given here FULL STACKTRACES please.

Also If problem still persists use a terminal window to see if the directory exists and what permission they have.

regards

learner4life · May 20, 2020, 11:50pm

Hi @RogerS49 The error I am seeing with the sp setup function is shown below. I am using paperspace where I again installed the latest versions

learner4life · May 20, 2020, 11:59pm

@RogerS49 Thanks for your detailed response earlier. I think my issues has got to do something with paperspace, so hopefully their reps will help resolve.

RogerS49 · May 21, 2020, 7:57am

Perhaps the cache_dir is not set to what fastai expects it to be or some permission is not correctly set

RogerS49 · September 10, 2020, 7:25am

Not able to help right know, you have posted in a topic that is not relevant for your problem, also in an area for non-beginners. Please create a new topic in “Part 1 (2020) > all” where most help will be available.

Regards