ULMFIT - Spanish

@imaginary
Wow. So I am not up to date. I definitely want to give attention-based models a shot. We can train on my GPU!

Yes. Good question when to stop training. I would say keep going as long as you have the resources and the loss is going down.

1 Like

I finally managed to import your LM model into v1, I have to make some tests but I highly doubt the performance will be as good since the decoder bias is not present on the .h5 files, so when converting to v1 format I had to add a random tensor as decoder bias.

Here is my modified version of the converter for reference for anyone else facing conversion issues:

def convert(path_to_old_model, path_to_save_converted_model):
"""
path_to_old_model is the path to old model (before fast.ai v1)
and 
path_to_save_converted_model is the path where the converted model is stored
"""
old_wgts = torch.load(path_to_old_model, map_location=lambda storage, loc: storage)
new_wgts = OrderedDict()
new_wgts['0.encoder.weight']=old_wgts['0.encoder.weight']
new_wgts['0.encoder_dp.emb.weight']=old_wgts['0.encoder_with_dropout.embed.weight']
new_wgts['0.rnns.0.weight_hh_l0_raw']=old_wgts['0.rnns.0.module.weight_hh_l0_raw']
new_wgts['0.rnns.0.module.weight_ih_l0']=old_wgts['0.rnns.0.module.weight_ih_l0']
new_wgts['0.rnns.0.module.weight_hh_l0']=old_wgts['0.rnns.0.module.weight_hh_l0_raw']
new_wgts['0.rnns.0.module.bias_ih_l0']=old_wgts['0.rnns.0.module.bias_ih_l0']
new_wgts['0.rnns.0.module.bias_hh_l0']=old_wgts['0.rnns.0.module.bias_hh_l0']
new_wgts['0.rnns.1.weight_hh_l0_raw']=old_wgts['0.rnns.1.module.weight_hh_l0_raw']
new_wgts['0.rnns.1.module.weight_ih_l0']=old_wgts['0.rnns.1.module.weight_ih_l0']
new_wgts['0.rnns.1.module.weight_hh_l0']=old_wgts['0.rnns.1.module.weight_hh_l0_raw']
new_wgts['0.rnns.1.module.bias_ih_l0']=old_wgts['0.rnns.1.module.bias_ih_l0']
new_wgts['0.rnns.1.module.bias_hh_l0']=old_wgts['0.rnns.1.module.bias_hh_l0']
new_wgts['0.rnns.2.weight_hh_l0_raw']=old_wgts['0.rnns.2.module.weight_hh_l0_raw']
new_wgts['0.rnns.2.module.weight_ih_l0']=old_wgts['0.rnns.2.module.weight_ih_l0']
new_wgts['0.rnns.2.module.weight_hh_l0']=old_wgts['0.rnns.2.module.weight_hh_l0_raw']
new_wgts['0.rnns.2.module.bias_ih_l0']=old_wgts['0.rnns.2.module.bias_ih_l0']
new_wgts['0.rnns.2.module.bias_hh_l0']=old_wgts['0.rnns.2.module.bias_hh_l0']
new_wgts['1.decoder.bias'] = torch.rand(60002)

torch.save(new_wgts, path_to_save_converted_model+'converted_model.pth')

I will try the LM and see how it works in my tests

Unsure on the labeling in the exported models but can’t you take advantage of weight tying so you don’t have to set the decoder weights to a random tensor? Maybe @sgugger could clarify if that applies here.

Do let us know how importing the model to v1 works out. If you do keep the random weights, you’ll probably have to do quite a bit of fine tuning and might not get to the same perplexity as @Andreas_Daiminger.

This is all part of the reason why I trained my LM on v1 and not 0.7.

@imaginary Can you please share the code you used to train your LM with fastai v1 ? So I can train a new LM these days.

Hey guys! I just started training on TransformerXL using most of the default fastai parameters. After 15 hours, I feel like I’m getting nowhere. This are my results so far.

I’m heavily based on what Francisco made on https://github.com/fpingham.
My repo in an all first version is https://github.com/cduguet/ulm-es

Total time: 14:31:08

epoch train_loss valid_loss accuracy time
1 6.409617 6.382536 0.119674 2:57:11
2 6.361414 6.385390 0.119674 2:57:34
3 6.377438 6.374975 0.119674 2:56:55
4 6.390207 6.361238 0.119674 2:50:20
5 6.383617 6.347020 0.119674 2:49:06
1 Like

I’ve managed to solve my problems from above by following the hyperparams discussed in https://forums.fast.ai/t/training-transformerxl/40104.

I am currently achieving 29.99 perplexity with a 60k vocab, on transformerXL, using the fastai v1 library. I’ve trained for 14 epochs so far.

The notebook (currently in bare bones) can be found here: https://github.com/cduguet/ulmfit-es/blob/master/ULMFit-TransfXL.ipynb

EDIT: I exported the learner (including weights) for inference. It is available in this link

2 Likes

Also, does someone know where to find another huge corpus in spanish? hopefully informal Spanish like from WebText?
I’ve seen this and this repo but I would not know how to efficiently filter for spanish text other than domain filtering.
I’m asking because I started overfitting after 14 epochs.

Hi,
This is great work, thanks a lot.
Do you know if anyone has further trained these models after Dec18?

Regards

Could you please tell me how to use your pretrained weights for inference?
I’ve been trying different ways but can’t seem to get it working.
Thank you.

you load the learner with learn = load_learner(path, fname='spanish.pkl'). path is where your spanish.pkl file is.
If you have a versioning problem, I saved these models using fastai 1.0.50pre1

1 Like

Thank you, that worked.
I can now use the model as a language model.
I have another question that may be a little naive.
If I want to use this pretrained model for classification with other dataset (for example, a medical dataset), is there a correct way of doing this? I am doing it right now by artificially setting the vocab_sz of the classification learner to be the same size of the vocab from the language model, because if I don’t do that, I get an error complaining about the weights having different sizes.

You should use the same vocab you used for pretraining. Look at Lesson 3 of the course, to see the correct way of doing this.

Thank you very much. That helped a lot.

1 Like

Hey Cristian, could you kindly send me the link for the weights? The one here seems to be outdated.

Thanks for letting me know! I’ve put the file back in Dropbox. You should be able to download it now.
I’ve realized something went wrong during my original tokenization (still don’t know what it was), and the model was training with a much much smaller DataBunch than it should have. I should have known since I never ran into memory problems XD. The metrics still are correct, though (29.99 perplexity), but with the bigger dataset I should be able to generalize more.

I am running the whole thing again. With the whole dataset and trying mixed precision (never tried in fp16 in NLP before). It will train a couple of more days, though.

Great, that should improve val_loss! If you finish training, please upload the itos file as well which allows fastai to create a new vocabulary with the new, finetuning dataset.

BTW: I am down to collaborate and do some of the training myself if helpful. I think having a great pretrained spanish language model is super important for a number of downstream tasks.

@imaginary @Andreas_Daiminger would you be interested in collaborating on training a SOTA TransformerXL for Spanish? I think our objective should be to improve the impressive 18.5 perplexity achieved by Andreas using LSTM.

1 Like

I would definitely be interested in training Transformer XL in Spanish. It is much more computationally intensive than ULMFiT though.
Does anybody know if there is a pre-trained Spanish BERT already?

1 Like

I have 300USD in Google Credits, do you think this is enough? I would need to train a backwards model as well, since I am trying to create a model that generates rap lyrics and this is necessary for rhymes.

This is about 200 hours on a GCP P100 instance. Hard to say how far you can get with this.
I know that BERT was trained with an insane amount of computation and on a huge Dataset. … not only Spanish Wikipedia. Might be very hard to compete with that. I also read that they will release pretrained BERT in Spanish at some point. Have you tried BERT multilingual for the task? Some researchers at my job are getting really good results for question answering with it.

1 Like