ULMFIT - Spanish

What did you use as your backbone?

Wiki Corpus in Spanish

I wrote the Spanish model using the V1 interfaces,
and only used the first part of the new data-preparation scripts.
How are people reporting the performance of the LM models?
Current accuracy for the LM is 34%.


We use perplexity as a loss function.

I’m sadly kinda limited on resources. Could someone who has trained a Spanish LM with the fastai v1 interfaces share it? It seems the formats have changed and I can’t load the pretrained model from the OP onto v1. It complains about the key 1.decoder.bias not existing.

Anyway, does anyone have a pretrained model with the v1 format? @gsg mentioned to have trained it with nice accuracy, maybe you could share yours? Thanks!

1 Like


  • LSTM language model: 4 epochs, 3.140521 for val loss and 0.376913 accuracy. Perplexity was thus 23.1038
  • QRNN language model: 7 epochs, 3.193912 for val loss and 0.367543 accuracy. Perplexity was thus 24.2884

Pre-trained models can be found here along with the itos file: https://drive.google.com/open?id=1CZftqrMg-MRH9yXV7FRBv6J_NOtBiK-2

I decided to train the LM on fastai v1 myself. I ended up using G Cloud services and taking advantage of their 300 USD credits. This allowed me to set up a V100 instance and just train there. Using QRNNs resulted in ~30 mins per epoch. LSTMS were around ~1:00 per epoch. I used a wiki dump and generated a 100M training set, with a 30k vocab. All this to say there’s definitely room for improvement and anyone could go ahead and improve these results.

Shoutouts to @sgugger for guiding me along the way and fixing a bug just in time for me train.

If someone could do some baseline testing with this LM, that’d be sweet.


Great work!

I would love to try a Spanish language model out. Do you have any idea how I can use your saved models? In the notebook they load Fastai’s model like:

learn = language_model_learner(data, pretrained_model=URLs.WT103_1, drop_mult=0.3)

A GoogleDrive URL is not going to work there…

Hi @lesscomfortable!
Congrats! Awesome work. I would also like to give it a go and train a Spanish LM.
I can see from your folder structure that you used Wikiextractor . Could you please share the command line args you passed to it.

Here is mine
WikiExtractor.py --json -o ../es_wiki+100 ../eswiki-20190120-pages-articles-multistream.xml.bz2 --no-templates --min_text_length 100 --filter_disambig_pages -de gallery,timeline,noinclude --processes 12

Did you use --min_text_length 100 or 1000 ? It’s not clear from the notebook. Any exclude options like -de gallery,timeline,noinclude

Would be cool to be on the sam page!! Thanks

Hey! To be honest, I don’t remember. I do know I used --min_test_length with a few different values till the size of the dataset was as big as I wanted it to be (better to have longer articles since they include more context and are usually higher quality). Hope this helps!

Thanks for getting back to me.

1 Like

Hello again. I have been checking out your repo again to get an idea on what hyperparams to use for the Spanish LM. Really interesting that you could use a max learning rate of 2. Even though the cycle starts out with a lower learning rate , I have never seen a learning rate that high. How was the overall performance of the model?

Hey @lesscomfortable !
Just to let you know I used your repo as a starting point for hyperparam-tuning and achieved SOTA on a dataset for classifying tweets for the 2016 elections in Spain.


That’s amazing Andreas, I am glad you found it useful!

Yeah! The training of the wiki language model would definitely taken longer if I hadn’t had your hyperparams as a reference.
I achieved F1(macro): 0.7308. The previous best was 0.6482. Here is the competition outline. I can say it blew some minds at Universitat de Valéncia.


Blowing some minds is always fun :sunglasses:

1 Like

Very nice!

Any particular reason you used the v0.7 interface? Also, did SGD perform better than Adam for the pre-training?

It feels like we should be picking the better LM so that we can tackle different tasks with it. I posted about mine using the v1 interface which is what I’ve been using.

I’ve been working on a problem myself but haven’t had as much success as you guys, I’ve only been able to beat other DNN models, but not the SOTA models which are mostly SVMs. I’m sure it can be done, I just haven’t gotten around to it and training takes quite a while since it’s so much data.

EDIT: Your LM was trained for 2 epochs? Can’t really tell from the notebook

1 Like


I tried to use your pretrained v1 model loading it like this:

weights_pretrained = 'model-eswiki-30k-vocab'
itos_pretrained = 'itos_pretrained'
pretained_data = (weights_pretrained, itos_pretrained)
learn = language_model_learner(data_lm, AWD_LSTM, pretrained_fnames=pretained_data, drop_mult=0)

However I receive this error:

RuntimeError: Error(s) in loading state_dict for SequentialRNN:
Missing key(s) in state_dict: "0.rnns.0.weight_hh_l0_raw", "0.rnns.0.module.weight_ih_l0", "0.rnns.0.module.weight_hh_l0", "0.rnns.0.module.bias_ih_l0", "0.rnns.0.module.bias_hh_l0", "0.rnns.1.weight_hh_l0_raw", "0.rnns.1.module.weight_ih_l0", "0.rnns.1.module.weight_hh_l0", "0.rnns.1.module.bias_ih_l0", "0.rnns.1.module.bias_hh_l0", "0.rnns.2.weight_hh_l0_raw", "0.rnns.2.module.weight_ih_l0", "0.rnns.2.module.weight_hh_l0", "0.rnns.2.module.bias_ih_l0", "0.rnns.2.module.bias_hh_l0". 
Unexpected key(s) in state_dict: "0.rnns.0.linear.weight_raw", "0.rnns.0.linear.module.weight", "0.rnns.0.linear.module.bias", "0.rnns.1.linear.weight_raw", "0.rnns.1.linear.module.weight", "0.rnns.1.linear.module.bias", "0.rnns.2.linear.weight_raw", "0.rnns.2.linear.module.weight", "0.rnns.2.linear.module.bias". 

It seems that the current version expects more keys than before (I’m running v1.0.45) of Fast.ai. So I can’t find the correct translation for the new keys.

It worked when I used the noqrnn version. Maybe I should add an argument to the language_model_learner call to use the other weights file.

Hey @imaginary!

I used v0.7 because I reused a lot of the code from the lessons in part 2 and Sebastian Ruder’s imdb scripts. As soon as part 2 2019 comes out I will port to the newest version. I hope they implement attention based models soon. :slight_smile:

I did not try Adam for pretraining. It took 4 days to train the wiki LM. So I did not fiddle around much. Also by reusing @lesscomfortable 's params it pretty much worked right from the beginning.

I trained the LM for 10 epochs and only used wiki articles with > 1000 words. I used a script for training so you can not find the train stats in any notebook. Here is the output of the training stats:

Epoch Loss Train Loss Val   Acc

 4      3.082257   3.072956   0.3823
 5      3.074148   3.047636   0.38464
 6      3.051877   3.024316   0.387418
 7      3.042151   3.00233    0.389987
 8      3.039151   2.981968   0.392641
 9      3.075293   2.95596    0.396033
10      3.019051   2.917865   0.401308

I lost the output for the first 3 epochs.

Good idea to share the best LM models. I think this is what the idea with the LM zoo is.
What is the particular problem you are working on yourself?

This afternoon I will try to import your model into v1 with the help of this function for converting the .h5 file to .pth. I Hope it goes well.