ULMFIT - Spanish

willtallpear · February 27, 2019, 12:04pm

This afternoon I will try to import your model into v1 with the help of this function for converting the .h5 file to .pth. I Hope it goes well.

Andreas_Daiminger · February 27, 2019, 3:48pm

Here is a link to download weights and integer to string mapping for the LM I trained on Spanish wiki articles. Let me know in case you give it a try.

Andreas_Daiminger · February 27, 2019, 3:54pm

Cool. I did not see your post before. I trained on my own Titan Xp for 4 full days with a 60k vocab.

imaginary · February 28, 2019, 2:10pm

If you wanna load the QRNN model you need to set qrnn=True in order for it to load properly and set the weights accordingly. AFAIK, that’s the only thing you should be changing. Other than that it should work. Thankfully LMs can still be loaded now that new structures were introduced to fastai.

imaginary · February 28, 2019, 2:18pm

Thanks for the thorough reply, @Andreas_Daiminger

Those are truly impressive results for the LM, especially considering the (60k) vocab size. I, too, filtered out the short articles from Wikipedia and trained on that. I actually stopped only after 4 epochs for LSTMs, but I could continue training and see where that lands me. I only used a 30k vocab size though.

Something I’m curious about is when the training should stop when it comes to LMs. In all the stats I’ve seen posted it definitely looks like training could continue but I don’t know if it’d be a good idea once you’re past a certain threshold. It’s interesting that our models just seem to smash SOTAs(?) Is there any standard for Spanish LMs or comparable SOTA results? Is using just standard AWD LSTMs the current SOTA result for Spanish? So many questions

Attention-based models are already in the library but as far as I’m aware no one has trained a better LM on them than on LSTMs or QRNNs. We could give it a shot if you want. I have some experience with the new v1 interface of fastai.

Andreas_Daiminger · February 28, 2019, 5:00pm

@imaginary
Wow. So I am not up to date. I definitely want to give attention-based models a shot. We can train on my GPU!

Yes. Good question when to stop training. I would say keep going as long as you have the resources and the loss is going down.

willtallpear · February 28, 2019, 11:02pm

I finally managed to import your LM model into v1, I have to make some tests but I highly doubt the performance will be as good since the decoder bias is not present on the .h5 files, so when converting to v1 format I had to add a random tensor as decoder bias.

Here is my modified version of the converter for reference for anyone else facing conversion issues:

def convert(path_to_old_model, path_to_save_converted_model):
"""
path_to_old_model is the path to old model (before fast.ai v1)
and 
path_to_save_converted_model is the path where the converted model is stored
"""
old_wgts = torch.load(path_to_old_model, map_location=lambda storage, loc: storage)
new_wgts = OrderedDict()
new_wgts['0.encoder.weight']=old_wgts['0.encoder.weight']
new_wgts['0.encoder_dp.emb.weight']=old_wgts['0.encoder_with_dropout.embed.weight']
new_wgts['0.rnns.0.weight_hh_l0_raw']=old_wgts['0.rnns.0.module.weight_hh_l0_raw']
new_wgts['0.rnns.0.module.weight_ih_l0']=old_wgts['0.rnns.0.module.weight_ih_l0']
new_wgts['0.rnns.0.module.weight_hh_l0']=old_wgts['0.rnns.0.module.weight_hh_l0_raw']
new_wgts['0.rnns.0.module.bias_ih_l0']=old_wgts['0.rnns.0.module.bias_ih_l0']
new_wgts['0.rnns.0.module.bias_hh_l0']=old_wgts['0.rnns.0.module.bias_hh_l0']
new_wgts['0.rnns.1.weight_hh_l0_raw']=old_wgts['0.rnns.1.module.weight_hh_l0_raw']
new_wgts['0.rnns.1.module.weight_ih_l0']=old_wgts['0.rnns.1.module.weight_ih_l0']
new_wgts['0.rnns.1.module.weight_hh_l0']=old_wgts['0.rnns.1.module.weight_hh_l0_raw']
new_wgts['0.rnns.1.module.bias_ih_l0']=old_wgts['0.rnns.1.module.bias_ih_l0']
new_wgts['0.rnns.1.module.bias_hh_l0']=old_wgts['0.rnns.1.module.bias_hh_l0']
new_wgts['0.rnns.2.weight_hh_l0_raw']=old_wgts['0.rnns.2.module.weight_hh_l0_raw']
new_wgts['0.rnns.2.module.weight_ih_l0']=old_wgts['0.rnns.2.module.weight_ih_l0']
new_wgts['0.rnns.2.module.weight_hh_l0']=old_wgts['0.rnns.2.module.weight_hh_l0_raw']
new_wgts['0.rnns.2.module.bias_ih_l0']=old_wgts['0.rnns.2.module.bias_ih_l0']
new_wgts['0.rnns.2.module.bias_hh_l0']=old_wgts['0.rnns.2.module.bias_hh_l0']
new_wgts['1.decoder.bias'] = torch.rand(60002)

torch.save(new_wgts, path_to_save_converted_model+'converted_model.pth')

I will try the LM and see how it works in my tests

imaginary · March 1, 2019, 2:42pm

Unsure on the labeling in the exported models but can’t you take advantage of weight tying so you don’t have to set the decoder weights to a random tensor? Maybe @sgugger could clarify if that applies here.

Do let us know how importing the model to v1 works out. If you do keep the random weights, you’ll probably have to do quite a bit of fine tuning and might not get to the same perplexity as @Andreas_Daiminger.

This is all part of the reason why I trained my LM on v1 and not 0.7.

Andreas_Daiminger · March 1, 2019, 4:55pm

@imaginary Can you please share the code you used to train your LM with fastai v1 ? So I can train a new LM these days.

cduguet · March 17, 2019, 1:33pm

Hey guys! I just started training on TransformerXL using most of the default fastai parameters. After 15 hours, I feel like I’m getting nowhere. This are my results so far.

I’m heavily based on what Francisco made on https://github.com/fpingham.
My repo in an all first version is https://github.com/cduguet/ulm-es

Total time: 14:31:08

epoch	train_loss	valid_loss	accuracy	time
1	6.409617	6.382536	0.119674	2:57:11
2	6.361414	6.385390	0.119674	2:57:34
3	6.377438	6.374975	0.119674	2:56:55
4	6.390207	6.361238	0.119674	2:50:20
5	6.383617	6.347020	0.119674	2:49:06

cduguet · March 21, 2019, 10:24pm

I’ve managed to solve my problems from above by following the hyperparams discussed in https://forums.fast.ai/t/training-transformerxl/40104.

I am currently achieving 29.99 perplexity with a 60k vocab, on transformerXL, using the fastai v1 library. I’ve trained for 14 epochs so far.

The notebook (currently in bare bones) can be found here: https://github.com/cduguet/ulmfit-es/blob/master/ULMFit-TransfXL.ipynb

EDIT: I exported the learner (including weights) for inference. It is available in this link

cduguet · March 22, 2019, 12:58pm

Also, does someone know where to find another huge corpus in spanish? hopefully informal Spanish like from WebText?
I’ve seen this and this repo but I would not know how to efficiently filter for spanish text other than domain filtering.
I’m asking because I started overfitting after 14 epochs.

Bliss · March 22, 2019, 11:19pm

Hi,
This is great work, thanks a lot.
Do you know if anyone has further trained these models after Dec18?

Regards

juliannicolas90 · April 3, 2019, 4:08am

Could you please tell me how to use your pretrained weights for inference?
I’ve been trying different ways but can’t seem to get it working.
Thank you.

cduguet · April 3, 2019, 10:16am

you load the learner with learn = load_learner(path, fname='spanish.pkl'). path is where your spanish.pkl file is.
If you have a versioning problem, I saved these models using fastai 1.0.50pre1

juliannicolas90 · April 4, 2019, 2:55am

Thank you, that worked.
I can now use the model as a language model.
I have another question that may be a little naive.
If I want to use this pretrained model for classification with other dataset (for example, a medical dataset), is there a correct way of doing this? I am doing it right now by artificially setting the vocab_sz of the classification learner to be the same size of the vocab from the language model, because if I don’t do that, I get an error complaining about the weights having different sizes.

cduguet · April 4, 2019, 10:47am

You should use the same vocab you used for pretraining. Look at Lesson 3 of the course, to see the correct way of doing this.

juliannicolas90 · April 9, 2019, 10:48pm

Thank you very much. That helped a lot.

lesscomfortable · May 4, 2019, 2:45pm

Hey Cristian, could you kindly send me the link for the weights? The one here seems to be outdated.

cduguet · May 5, 2019, 7:56am

Thanks for letting me know! I’ve put the file back in Dropbox. You should be able to download it now.
I’ve realized something went wrong during my original tokenization (still don’t know what it was), and the model was training with a much much smaller DataBunch than it should have. I should have known since I never ran into memory problems XD. The metrics still are correct, though (29.99 perplexity), but with the bigger dataset I should be able to generalize more.

I am running the whole thing again. With the whole dataset and trying mixed precision (never tried in fp16 in NLP before). It will train a couple of more days, though.