What is itos in Language Models

renard · May 11, 2019, 8:47pm

Hello,
Noob question here. What is itos in fastai language models and how can I generate it from language model the model that I trained

urmas.pitsi · May 11, 2019, 9:21pm

Check this:

github.com

fastai/fastai/blob/e5b98c9171c502bfab9734b31d3d85cd0ca83e44/fastai/text/transform.py#L122


        tok = self.tok_func(self.lang)
        if self.special_cases: tok.add_special_cases(self.special_cases)
        return [self.process_text(str(t), tok) for t in texts]


    def process_all(self, texts:Collection[str]) -> List[List[str]]:
        "Process a list of `texts`."
        if self.n_cpus <= 1: return self._process_all_1(texts)
        with ProcessPoolExecutor(self.n_cpus) as e:
            return sum(e.map(self._process_all_1, partition_by_cores(texts, self.n_cpus)), [])


class Vocab():
    "Contain the correspondence between numbers and tokens and numericalize."
    def __init__(self, itos:Collection[str]):
        self.itos = itos
        self.stoi = collections.defaultdict(int,{v:k for k,v in enumerate(self.itos)})


    def numericalize(self, t:Collection[str]) -> List[int]:
        "Convert a list of tokens `t` to their ids."
        return [self.stoi[w] for w in t]


    def textify(self, nums:Collection[int], sep=' ') -> List[str]:

https://docs.fast.ai/text.transform.html

renard · May 12, 2019, 9:15pm

Thank you

msrdinesh · May 29, 2019, 5:23am

Hii I didn’t get the answer. Can you share exact code code to save itos of a language model?

I ran the following code:
pickle.dump(data.vocab.itos, open(PATH, ‘wb’))

Then I loaded in this way,
learn = language_model_learner(data_lm, AWD_LSTM , pretrained_fnames = (‘model_path.pth’,‘itos_path’), pretrained= False, drop_mult=0.3)

I got the following error:

TypeError Traceback (most recent call last)
in
1 # Define language model learner
----> 2 learn = language_model_learner(data_lm, AWD_LSTM , pretrained_fnames = (‘fine_tuned_LM_Cap_EA_AD_OI.pth’,‘itos’), pretrained= False, drop_mult=0.3)

/home/AIX_Common/files/opt/anaconda2/envs/ai-gpu/lib/python3.6/site-packages/fastai/text/learner.py in language_model_learner(data, arch, config, drop_mult, pretrained, pretrained_fnames, **learn_kwargs)
215 if pretrained_fnames is not None:
216 fnames = [learn.path/learn.model_dir/f’{fn}.{ext}’ for fn,ext in zip(pretrained_fnames, [‘pth’, ‘pkl’])]
–> 217 learn.load_pretrained(*fnames)
218 learn.freeze()
219 return learn

/home/AIX_Common/files/opt/anaconda2/envs/ai-gpu/lib/python3.6/site-packages/fastai/text/learner.py in load_pretrained(self, wgts_fname, itos_fname, strict)
73 “Load a pretrained model and adapts it to the data vocabulary.”
74 old_itos = pickle.load(open(itos_fname, ‘rb’))
—> 75 old_stoi = {v:k for k,v in enumerate(old_itos)}
76 wgts = torch.load(wgts_fname, map_location=lambda storage, loc: storage)
77 if ‘model’ in wgts: wgts = wgts[‘model’]

TypeError: ‘int’ object is not iterable

Please let me know where I went wrong…

renard · July 22, 2019, 11:12am

itos is just a list of all words in the vocabulary of a TextDataBunch. If you want to load pretrained model here is an example:

learn_lm = language_model_learner(data_lm, AWD_LSTM, pretrained=False)
learn_lm.load_pretrained(wgts_fname=pretrained_lm_file, itos_fname=pickled_itos_file_name)

but if you want to create a classifier based on language model then go with:

data_class = TextClasDataBunch.from_df(..., vocab=data_lm.vocab)
class_learner = text_classifier_learner(data_class, AWD_LSTM)
class_learner.load_encoder('path_to_lm_encoder') # you can save encoder with 'learn_lm.save_encoder'

Let me know if it helped

msrdinesh · July 22, 2019, 11:37am

Thanks! Got it…

tbass134 · April 5, 2021, 1:40pm

When i try to use this with a pretrained model, i get this error:
'SequentialRNN' object has no attribute 'get'