Spanish language model achieved SOTA in the General TASS dataset. I achieved a 0.57 F1 Score in the TASS General Corpus Dataset while SOTA was 0.562 (see notebook).
My general approach was the same as Jeremy’s with the only difference of adding a ‘tweet-specific’ pre-processing step to help the model with useful tokens that might improve performance.
I wrote the Spanish model using the V1 interfaces,
and only used the first part of the new data-preparation scripts.
How are people reporting the performance of the LM models?
Current accuracy for the LM is 34%.
I’m sadly kinda limited on resources. Could someone who has trained a Spanish LM with the fastai v1 interfaces share it? It seems the formats have changed and I can’t load the pretrained model from the OP onto v1. It complains about the key 1.decoder.bias not existing.
Anyway, does anyone have a pretrained model with the v1 format? @gsg mentioned to have trained it with nice accuracy, maybe you could share yours? Thanks!
I decided to train the LM on fastai v1 myself. I ended up using G Cloud services and taking advantage of their 300 USD credits. This allowed me to set up a V100 instance and just train there. Using QRNNs resulted in ~30 mins per epoch. LSTMS were around ~1:00 per epoch. I used a wiki dump and generated a 100M training set, with a 30k vocab. All this to say there’s definitely room for improvement and anyone could go ahead and improve these results.
Shoutouts to @sgugger for guiding me along the way and fixing a bug just in time for me train.
If someone could do some baseline testing with this LM, that’d be sweet.
Hi @lesscomfortable!
Congrats! Awesome work. I would also like to give it a go and train a Spanish LM.
I can see from your folder structure that you used Wikiextractor . Could you please share the command line args you passed to it.
Here is mine WikiExtractor.py --json -o ../es_wiki+100 ../eswiki-20190120-pages-articles-multistream.xml.bz2 --no-templates --min_text_length 100 --filter_disambig_pages -de gallery,timeline,noinclude --processes 12
Did you use --min_text_length 100 or 1000 ? It’s not clear from the notebook. Any exclude options like -de gallery,timeline,noinclude
Hey! To be honest, I don’t remember. I do know I used --min_test_length with a few different values till the size of the dataset was as big as I wanted it to be (better to have longer articles since they include more context and are usually higher quality). Hope this helps!
@lesscomfortable
Hello again. I have been checking out your repo again to get an idea on what hyperparams to use for the Spanish LM. Really interesting that you could use a max learning rate of 2. Even though the cycle starts out with a lower learning rate , I have never seen a learning rate that high. How was the overall performance of the model?
Hey @lesscomfortable !
Just to let you know I used your repo as a starting point for hyperparam-tuning and achieved SOTA on a dataset for classifying tweets for the 2016 elections in Spain. Repo
Yeah! The training of the wiki language model would definitely taken longer if I hadn’t had your hyperparams as a reference.
I achieved F1(macro): 0.7308. The previous best was 0.6482. Here is the competition outline. I can say it blew some minds at Universitat de Valéncia.
Any particular reason you used the v0.7 interface? Also, did SGD perform better than Adam for the pre-training?
It feels like we should be picking the better LM so that we can tackle different tasks with it. I posted about mine using the v1 interface which is what I’ve been using.
I’ve been working on a problem myself but haven’t had as much success as you guys, I’ve only been able to beat other DNN models, but not the SOTA models which are mostly SVMs. I’m sure it can be done, I just haven’t gotten around to it and training takes quite a while since it’s so much data.
EDIT: Your LM was trained for 2 epochs? Can’t really tell from the notebook
RuntimeError: Error(s) in loading state_dict for SequentialRNN:
Missing key(s) in state_dict: "0.rnns.0.weight_hh_l0_raw", "0.rnns.0.module.weight_ih_l0", "0.rnns.0.module.weight_hh_l0", "0.rnns.0.module.bias_ih_l0", "0.rnns.0.module.bias_hh_l0", "0.rnns.1.weight_hh_l0_raw", "0.rnns.1.module.weight_ih_l0", "0.rnns.1.module.weight_hh_l0", "0.rnns.1.module.bias_ih_l0", "0.rnns.1.module.bias_hh_l0", "0.rnns.2.weight_hh_l0_raw", "0.rnns.2.module.weight_ih_l0", "0.rnns.2.module.weight_hh_l0", "0.rnns.2.module.bias_ih_l0", "0.rnns.2.module.bias_hh_l0".
Unexpected key(s) in state_dict: "0.rnns.0.linear.weight_raw", "0.rnns.0.linear.module.weight", "0.rnns.0.linear.module.bias", "0.rnns.1.linear.weight_raw", "0.rnns.1.linear.module.weight", "0.rnns.1.linear.module.bias", "0.rnns.2.linear.weight_raw", "0.rnns.2.linear.module.weight", "0.rnns.2.linear.module.bias".
It seems that the current version expects more keys than before (I’m running v1.0.45) of Fast.ai. So I can’t find the correct translation for the new keys.
I used v0.7 because I reused a lot of the code from the lessons in part 2 and Sebastian Ruder’s imdb scripts. As soon as part 2 2019 comes out I will port to the newest version. I hope they implement attention based models soon.
I did not try Adam for pretraining. It took 4 days to train the wiki LM. So I did not fiddle around much. Also by reusing @lesscomfortable 's params it pretty much worked right from the beginning.
I trained the LM for 10 epochs and only used wiki articles with > 1000 words. I used a script for training so you can not find the train stats in any notebook. Here is the output of the training stats: