Language Model Zoo 🦍

(Kaspar Lund) #376

I have see this phenomenon several times when training TransformerlXL. Do you @sgugger have any idea what is going on?




That would suggest a too high learning rate since it just breaks.


(Kaspar Lund) #378

ok but to be more precise why does it occure “between” 2 epochs ?



Why would it occur at a specific time? I’m confused about what you’re asking.

1 Like

(Piotr Czapla) #380

Do you mean that you get the crash when epoch starts again?
Does it do that each time?

Interesting idea to use language modeling to do transfer learning to machine translation task.
Some ppl tried that with BERT and they failed.

There are other concepts like back translation that let you use large monolingual texts in MT, and we know that they are working well. Check Sebastian is listing there the recent approaches to MT.


(benedikt herudek) #381

thx for this link @piotr.czapla!


(Kaspar Lund) #382

Reducing the learning rate to 1e-4 gets the training back in control. However i have seen this jump in training loss more frequently between epochs than inside an epoch. That makes me wonder whether the state of the network/buffers are restored correctly between validation / training.


(Kaspar Lund) #383

just a heads up concerning TransformerXL. although i have not made an exhaustive hyperparameter search it does look like a slow start (pct_start) accelerates ceonvergence - by a lot

%time learn.fit_one_cycle(cyc_len=epochs, max_lr=5e-4, moms=(0.95,0.85), wd=1e-3, pct_start=0.01)

1 Like

(Davide Boschetto) #384

I’m about to start something for Italian since the assigned user has been inactive since last June, so I’m here asking: is the first post updated, with all the good tips working for the latest fastai version, or should I use some old version to make it work smoothly?


(Vinit Sutar) #385


I’m currently working on a dataset which is tabular in nature. It contains categories of 7 emotions.
Given Data:
id, sentence, emotion
id, emotion.

I want to use ULMFit to analyse this tabular dataset, and predict the sentiment of ids given based on the sentence for that corresponding id.

i’m confused as to how do i proceed after reading the csv file


(hanan) #386

hey all,
i trained the model in Hebrew Wikipedia.](
can u update the status?
and should i just open new New Topic?

1 Like


interested in this as well. Have been thinking about it for ages


(Piotr Czapla) #388

The Italian models are done in an effort of comparing ULMFiT against BERT, I need to find some time to move the modifications to fastai but for the time being the models can be found here:
They are working with
I would love to see how it works on other Italian datasets than MLDoc.

Please open the thread. Hebrew is not tackle yet as far as I know. Have you found a suitable dataset to test ULMFiT against?

@miko, @DavideBoschetto, we have trained 2 Italian language models and one classification model on MLDoc. Having that set there are still some things that would be helpful to experiment with.

  • test the current models on other datasets than MLDoc - it would be best if you would add such dataset in the same way as we added mldoc and cls to the ulmfit-multilingual.
  • search some better hyperparameters for Italian. We tested only 3 models in a rather standard way maybe you find a better set of hyperparameters.

(hanan) #389

hey, actually i just broke record of some banchmark that realease last yaer (in almost 2%!!). I’m in touch now with the auther to validate my results.
i open a thread ULMFiT - Hebrew


(Carlos Vouking) #390

‘nvidia-smi dmon’ and ‘nvidia-smi pmon’ commands could also be helpful.

1 Like

(Kristian Rother) #391

This sounds like a straightforward ULMFiT problem, if I understand you correctly. My guess would be the approach is:

  1. Build or use a pre-existing language model (like Wikitext103)
  2. Transform your dataset from [id, sentence, emotion] to [0, sentence] because you train your language model on unlabeled data. Also split it into train/validation
  3. Use the new dataset to finetune (load the LM weights from 1, retrain). Save the model, save the encoder
  4. Load the encoder and train a classifier with your [id, sentence, emotion] dataset (since the emotion is the label)
  5. Use predict to write your [id, emotion] target. You have to map the ids somehow.

Also note that this is multilabel classification and not binary as in most default examples. Check out the documentation or the RNN video from 2019 (lesson 3 iirc) and the corresponding notebook.


(Fred Guth) #392

I was able to create a pt-br LM and have saved the model .pth and the itos.pkl.
Now I want to classify a different corpus and use my pretrained language model. I was not able to reproduce IMDB because it does not show how to load a model, it assumes you are doing it with english and download the pretrained wiki103 english lm.

Is there a notebook showing how to classify using your pretrained lm?



I would like to ask you do you create translation modules based on language modules? Such as German to English or so, like we have in Google Translate, would that also be the sub purpose of this thread?


(Johannes Lackner) #394

I loaded the model weights (.pth) & itos.pkl from a german LM into my LM learner like this:

You train the LM, then save the encoder part. Then you set up your classifier (as described in the course v3 IMDB notebook), load your LM encoder into it and classify:

learn = text_classifier_learner(data, AWD_LSTM, pretrained = False, drop_mult=0.05)


(Serge Mankovski) #395

Is there a repository for the models? I am training a bacterial genome language model that was shared by @KarlH and it seems that I am getting somewhere.

The model did not do very well on a small sample of genomes, but after increasing number of genomes from a couple of dozens to a few thousands made a difference. This model might turn out useful for bioinformatics after all. But boy, is it training slowly… it is like watching paint dry :slight_smile: