Lesson 4. Load pretrained language model from another language

(Bruno Sánchez-Andrade Nuño) #1

I am following Lesson 4 on Language models. I am trying to load a pretainned model of another language from the Model Zoo. In particular the Spanish one, which links to this Google files.

I load my data as usual:

data_lm = (TextList.from_csv(path,'tweets.csv',cols='tweet')
           #Inputs: all the text files in path 
           #We may have other temp folders that contain text files so we only keep what's in train and test
            .split_by_rand_pct(0.1)
           #We randomly split and keep 10% (10,000 reviews) for validation
            .label_for_lm()           
           #We want to do a language model so we label accordingly
            .databunch(bs=bs))
data_lm.show_batch()

and I get this, which seems good:

|dx |text|
|---|---|
|0 |sobre lo que el xxup xxunk sabía de los terroristas de xxmaj las xxmaj xxunk revela cómo funciona el poder en xxmaj españa . y sirve para entender ciertos vetos para que nada cambie . ¿ xxmaj por qué xxup pp , xxup psoe y xxmaj cs xxunk que el xxmaj congreso xxunk ? ¿ xxmaj por qué xxunk hoy ? pic.twitter.com / xxunk xxbos xxmaj si estás inscrito en|
|1 |la mesa nos acompañaron @g_pisarello , @guillemmartnez , xxunk , @jdsato , @m_corrales _ y @tableroglobal . https : / / www.youtube.com / watch?v = xxunk   … xxbos xxmaj disfruta de tu xxunk @pnique que te va a xxunk poco 😉 pic.twitter.com / xxunk xxbos xxmaj las cifras del paro van xxunk , pero la precariedad laboral continúa siendo muy preocupante en xxmaj españa . xxmaj seguiremos trabajando para|
|2 |se puede . xxmaj así lo he dicho en xxmaj valladolid 👇 🏻 pic.twitter.com / xxunk xxbos xxmaj hay tres posibilidades . xxmaj un acuerdo entre las tres derechas . xxmaj un acuerdo entre xxmaj cs y xxup psoe , que xxunk un xxmaj gobierno de derechas . y un xxmaj gobierno progresista al servicio de la gente , que defienda y xxunk los derechos sociales de todos y todas|

Now I create the learner, and I want to use the pretrained model, but I don’t know how.
If Is use;

learn = language_model_learner(data_lm,AWD_LSTM)

It works, but I’m pretty sure it’s loading the default Wikipedia 103.
If I use this:

learn = language_model_learner(data_lm,AWD_LSTM,pretrained_fnames='models/')

it fails:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-83-88303e2cf6e2> in <module>
----> 1 learn = language_model_learner(data_lm,AWD_LSTM,pretrained_fnames='models/')

/opt/anaconda3/lib/python3.7/site-packages/fastai/text/learner.py in language_model_learner(data, arch, config, drop_mult, pretrained, pretrained_fnames, **learn_kwargs)
    215             model_path = untar_data(meta[url] , data=False)
    216             fnames = [list(model_path.glob(f'*.{ext}'))[0] for ext in ['pth', 'pkl']]
--> 217         learn.load_pretrained(*fnames)
    218         learn.freeze()
    219     return learn

/opt/anaconda3/lib/python3.7/site-packages/fastai/text/learner.py in load_pretrained(self, wgts_fname, itos_fname, strict)
     72     def load_pretrained(self, wgts_fname:str, itos_fname:str, strict:bool=True):
     73         "Load a pretrained model and adapts it to the data vocabulary."
---> 74         old_itos = pickle.load(open(itos_fname, 'rb'))
     75         old_stoi = {v:k for k,v in enumerate(old_itos)}
     76         wgts = torch.load(wgts_fname, map_location=lambda storage, loc: storage)

FileNotFoundError: [Errno 2] No such file or directory: '~/tweets/models/o.pkl'

but the files are there:

jupyter@my-fastai-instance:~/tweets/models$ ls
itos_pretrained.pkl  model-30k-vocab-noqrnn.pth  model-eswiki-30k-vocab.pth

(I’ve tried renaming the file to o.pkl as it asks)

How do you load (and check) a pre-trainned model?

Thanks!

0 Likes