ULMFIT - Spanish

(Guille) #23

This afternoon I will try to import your model into v1 with the help of this function for converting the .h5 file to .pth. I Hope it goes well.

0 Likes

(Andreas Daiminger) #24

Here is a link to download weights and integer to string mapping for the LM I trained on Spanish wiki articles. Let me know in case you give it a try.

1 Like

(Andreas Daiminger) #25

Cool. I did not see your post before. I trained on my own Titan Xp for 4 full days with a 60k vocab.

0 Likes

(JDV) #26

If you wanna load the QRNN model you need to set qrnn=True in order for it to load properly and set the weights accordingly. AFAIK, that’s the only thing you should be changing. Other than that it should work. Thankfully LMs can still be loaded now that new structures were introduced to fastai.

0 Likes

(JDV) #27

Thanks for the thorough reply, @Andreas_Daiminger

Those are truly impressive results for the LM, especially considering the (60k) vocab size. I, too, filtered out the short articles from Wikipedia and trained on that. I actually stopped only after 4 epochs for LSTMs, but I could continue training and see where that lands me. I only used a 30k vocab size though.

Something I’m curious about is when the training should stop when it comes to LMs. In all the stats I’ve seen posted it definitely looks like training could continue but I don’t know if it’d be a good idea once you’re past a certain threshold. It’s interesting that our models just seem to smash SOTAs(?) Is there any standard for Spanish LMs or comparable SOTA results? Is using just standard AWD LSTMs the current SOTA result for Spanish? So many questions :smiley:

Attention-based models are already in the library but as far as I’m aware no one has trained a better LM on them than on LSTMs or QRNNs. We could give it a shot if you want. I have some experience with the new v1 interface of fastai.

1 Like

(Andreas Daiminger) #28

@imaginary
Wow. So I am not up to date. I definitely want to give attention-based models a shot. We can train on my GPU!

Yes. Good question when to stop training. I would say keep going as long as you have the resources and the loss is going down.

1 Like

(Guille) #29

I finally managed to import your LM model into v1, I have to make some tests but I highly doubt the performance will be as good since the decoder bias is not present on the .h5 files, so when converting to v1 format I had to add a random tensor as decoder bias.

Here is my modified version of the converter for reference for anyone else facing conversion issues:

def convert(path_to_old_model, path_to_save_converted_model):
"""
path_to_old_model is the path to old model (before fast.ai v1)
and 
path_to_save_converted_model is the path where the converted model is stored
"""
old_wgts = torch.load(path_to_old_model, map_location=lambda storage, loc: storage)
new_wgts = OrderedDict()
new_wgts['0.encoder.weight']=old_wgts['0.encoder.weight']
new_wgts['0.encoder_dp.emb.weight']=old_wgts['0.encoder_with_dropout.embed.weight']
new_wgts['0.rnns.0.weight_hh_l0_raw']=old_wgts['0.rnns.0.module.weight_hh_l0_raw']
new_wgts['0.rnns.0.module.weight_ih_l0']=old_wgts['0.rnns.0.module.weight_ih_l0']
new_wgts['0.rnns.0.module.weight_hh_l0']=old_wgts['0.rnns.0.module.weight_hh_l0_raw']
new_wgts['0.rnns.0.module.bias_ih_l0']=old_wgts['0.rnns.0.module.bias_ih_l0']
new_wgts['0.rnns.0.module.bias_hh_l0']=old_wgts['0.rnns.0.module.bias_hh_l0']
new_wgts['0.rnns.1.weight_hh_l0_raw']=old_wgts['0.rnns.1.module.weight_hh_l0_raw']
new_wgts['0.rnns.1.module.weight_ih_l0']=old_wgts['0.rnns.1.module.weight_ih_l0']
new_wgts['0.rnns.1.module.weight_hh_l0']=old_wgts['0.rnns.1.module.weight_hh_l0_raw']
new_wgts['0.rnns.1.module.bias_ih_l0']=old_wgts['0.rnns.1.module.bias_ih_l0']
new_wgts['0.rnns.1.module.bias_hh_l0']=old_wgts['0.rnns.1.module.bias_hh_l0']
new_wgts['0.rnns.2.weight_hh_l0_raw']=old_wgts['0.rnns.2.module.weight_hh_l0_raw']
new_wgts['0.rnns.2.module.weight_ih_l0']=old_wgts['0.rnns.2.module.weight_ih_l0']
new_wgts['0.rnns.2.module.weight_hh_l0']=old_wgts['0.rnns.2.module.weight_hh_l0_raw']
new_wgts['0.rnns.2.module.bias_ih_l0']=old_wgts['0.rnns.2.module.bias_ih_l0']
new_wgts['0.rnns.2.module.bias_hh_l0']=old_wgts['0.rnns.2.module.bias_hh_l0']
new_wgts['1.decoder.bias'] = torch.rand(60002)

torch.save(new_wgts, path_to_save_converted_model+'converted_model.pth')

I will try the LM and see how it works in my tests

0 Likes

(JDV) #30

Unsure on the labeling in the exported models but can’t you take advantage of weight tying so you don’t have to set the decoder weights to a random tensor? Maybe @sgugger could clarify if that applies here.

Do let us know how importing the model to v1 works out. If you do keep the random weights, you’ll probably have to do quite a bit of fine tuning and might not get to the same perplexity as @Andreas_Daiminger.

This is all part of the reason why I trained my LM on v1 and not 0.7.

0 Likes

(Andreas Daiminger) #31

@imaginary Can you please share the code you used to train your LM with fastai v1 ? So I can train a new LM these days.

0 Likes

(Cristian Duguet) #32

Hey guys! I just started training on TransformerXL using most of the default fastai parameters. After 15 hours, I feel like I’m getting nowhere. This are my results so far.

I’m heavily based on what Francisco made on https://github.com/fpingham.
My repo in an all first version is https://github.com/cduguet/ulm-es

Total time: 14:31:08

epoch train_loss valid_loss accuracy time
1 6.409617 6.382536 0.119674 2:57:11
2 6.361414 6.385390 0.119674 2:57:34
3 6.377438 6.374975 0.119674 2:56:55
4 6.390207 6.361238 0.119674 2:50:20
5 6.383617 6.347020 0.119674 2:49:06
0 Likes

(Cristian Duguet) #33

I’ve managed to solve my problems from above by following the hyperparams discussed in https://forums.fast.ai/t/training-transformerxl/40104.

I am currently achieving 29.99 perplexity with a 60k vocab, on transformerXL, using the fastai v1 library. I’ve trained for 14 epochs so far.

The notebook (currently in bare bones) can be found here: https://github.com/cduguet/ulmfit-es/blob/master/ULMFit-TransfXL.ipynb

EDIT: I exported the learner (including weights) for inference. It is available in this link

1 Like

(Cristian Duguet) #34

Also, does someone know where to find another huge corpus in spanish? hopefully informal Spanish like from WebText?
I’ve seen this and this repo but I would not know how to efficiently filter for spanish text other than domain filtering.
I’m asking because I started overfitting after 14 epochs.

0 Likes

#35

Hi,
This is great work, thanks a lot.
Do you know if anyone has further trained these models after Dec18?

Regards

0 Likes

(Julian) #36

Could you please tell me how to use your pretrained weights for inference?
I’ve been trying different ways but can’t seem to get it working.
Thank you.

0 Likes

(Cristian Duguet) #37

you load the learner with learn = load_learner(path, fname='spanish.pkl'). path is where your spanish.pkl file is.
If you have a versioning problem, I saved these models using fastai 1.0.50pre1

0 Likes

(Julian) #38

Thank you, that worked.
I can now use the model as a language model.
I have another question that may be a little naive.
If I want to use this pretrained model for classification with other dataset (for example, a medical dataset), is there a correct way of doing this? I am doing it right now by artificially setting the vocab_sz of the classification learner to be the same size of the vocab from the language model, because if I don’t do that, I get an error complaining about the weights having different sizes.

0 Likes

(Cristian Duguet) #39

You should use the same vocab you used for pretraining. Look at Lesson 3 of the course, to see the correct way of doing this.

0 Likes

(Julian) #40

Thank you very much. That helped a lot.

0 Likes