Language Model Zoo 🦍

(Kristian Rother) #391

This sounds like a straightforward ULMFiT problem, if I understand you correctly. My guess would be the approach is:

  1. Build or use a pre-existing language model (like Wikitext103)
  2. Transform your dataset from [id, sentence, emotion] to [0, sentence] because you train your language model on unlabeled data. Also split it into train/validation
  3. Use the new dataset to finetune (load the LM weights from 1, retrain). Save the model, save the encoder
  4. Load the encoder and train a classifier with your [id, sentence, emotion] dataset (since the emotion is the label)
  5. Use predict to write your [id, emotion] target. You have to map the ids somehow.

Also note that this is multilabel classification and not binary as in most default examples. Check out the documentation or the RNN video from 2019 (lesson 3 iirc) and the corresponding notebook.


(Fred Guth) #392

I was able to create a pt-br LM and have saved the model .pth and the itos.pkl.
Now I want to classify a different corpus and use my pretrained language model. I was not able to reproduce IMDB because it does not show how to load a model, it assumes you are doing it with english and download the pretrained wiki103 english lm.

Is there a notebook showing how to classify using your pretrained lm?



I would like to ask you do you create translation modules based on language modules? Such as German to English or so, like we have in Google Translate, would that also be the sub purpose of this thread?


(Johannes Lackner) #394

I loaded the model weights (.pth) & itos.pkl from a german LM into my LM learner like this:

You train the LM, then save the encoder part. Then you set up your classifier (as described in the course v3 IMDB notebook), load your LM encoder into it and classify:

learn = text_classifier_learner(data, AWD_LSTM, pretrained = False, drop_mult=0.05)


(Serge Mankovski) #395

Is there a repository for the models? I am training a bacterial genome language model that was shared by @KarlH and it seems that I am getting somewhere.

The model did not do very well on a small sample of genomes, but after increasing number of genomes from a couple of dozens to a few thousands made a difference. This model might turn out useful for bioinformatics after all. But boy, is it training slowly… it is like watching paint dry :slight_smile: