ULMFiT - German

I think there’s only three predefined architectures you can load for the time being, see the fastai docs here:
https://docs.fast.ai/text.learner.html#language_model_learner

Trying to do arch=DE_model wouldn’t work, according to the docs. I guess you’ll need arch= AWD_LSTM . And then you would load your custom DE_model via
load_pretrained (https://docs.fast.ai/text.learner.html#RNNLearner.load_pretrained)

This just for now, I’ll provide updates if I can get the pretrained language model to work.

I pretrained on german wikipedia data and scraped about 60k~ german amazon reviews to ultimately create a sentiment classifier. It achieves about 93% accuracy on training, validation as well as an independent test set so I’m happy about that.

If it’s helpful for anyone I’ll upload my pretrained model here.

2 Likes

Hi,
that would be wonderful! How long did it take you to train?

Hi @jyr1,

I’d try it out, thanks :slight_smile:

Please do an upload!

Hi @jyr1,
It will be a great help to me or to the community.
Please do provide a link of your work.
The pretrained model discussed here is neither working for me.
Thanks in advance :slight_smile:
Regards Pappu

About 12 hours on a Tesla P100 (Google Cloud). To be honest I don’t have much other details as I was merely interested in sentiment classification and not the preliminary steps.

Anyways, here’s the link, the vocab is 30k. You can load the model directly in language_model_learner using the pretrained_fnames argument.

https://drive.google.com/open?id=1gkuY3Tz6LBmcehAnZ95jssV80CBQh7L1

3 Likes

Hi @jyr1,
Thanks a lot for sharing and highly appreciate your work.
Could you also give the link to the Amazon Reviews, if I can check against that data-set?
Regards,
Pappu

We have 3 pre-trained models for German and we are working on incorporating them to the fast.ai library. Two of them are using SentencePiece and one is using vocabulary. The only issue with our models that they are using QRNN which is faster than LSTM but you need a functional CUDA compiler in your path so that Fastai can build the necessary modules for you.

The issues you are experiencing with previous models are most likely caused by changes to the fastai library. If you can’t wait, go to the https://github.com/n-waves/ulmfit-multilingual there is a version of ULMFiT extended with setence piece and we have pretrained models here: https://drive.google.com/drive/u/0/folders/1t3DqH0ZJC2gDMKEq9vTa5jqxBiVAnClR

We are working to address, stay tuned :slight_smile:

Apparently BERT and Facebooks LASER have worse performance than ULMFiT for classsification task we checked. Have a look here on the results: https://github.com/n-waves/ulmfit-multilingual/blob/master/results/MLDoc.md

Really cool. Why have you used only 30k words?

2 Likes

Thank you for sharing, that’s great! Running your model on Colab seems to be working without hiccups so far.
Have a great day & Tschüs
Johannes

Cool, can you share your notebooks so others can see how to run it?

Piotr,
Thank your for these helpful pointers! I am definitely looking forward to seeing
The Model Zoo populated! :wink:
Can’t wait to check out the QRNN pretrained LMs you generously pointed to, not sure yet if I know how to surmount the CUDA obstacles you are indicating, but will try (I have a GCP account as well).

All the Best from Geneva, keep up the great work

There is a chance it will work out of the box on Collab, the issue is caused by Cuda 9 not supporting the newest gcc. To fix it I had to install gcc version 5.

Yes, of course, I will, as soon as I have something more to show. Colab just finished the first learning rate search (which is further than I got with the other pretrained models, thank you @jyr1).


Quick sanity check Piotr. I’m currently rebuilding my old model and it takes a lot longer than previously. How long did it take you to train 1 epoch for the German LM with fast.ai v1 (and on what hardware)?
It currently takes me 15h and previously it was 2-3h. Would appreciate a quick “nope, we trained one epoch quite quickly” so that I know I can keep searching for the cause :smiley:

~1h for qrnn and 2h lstm, all 4layers on 1080 TI. You can check the logs here, some of them have the training time as Sylvain added that to the progress bar.
Most of the training was done either on 1080ti or v100

Thank you all for sharing!

I’m now using the Wiki-Model, this fixed my problem. By the way, I’m doing an evaluation of Statustexts.

Thank you all for your help!!!

Tschüss :slight_smile:

Hi All,

I have used the pre-trained model from @jyr1 on the datasets from here GermEval-2018-Data.
The dataset contains 5009 tweets as train-set and ~3300 as test set.
The model achieved between 66-70% which is random as the labels are in the ratio 2:1(OTHER:OFFENSE).
Could any of you post something If your model achieved better result?
As written earlier by @jyr1 that his model achieved 93% accuracy on Amazon Review dataset. Could you(@jyr1) apply your model on the dataset link given above and tell us if the model has achieve such a high accuracy even on Twitter dataset?
It will be great then!!
:slight_smile:

I quicky summarize the results. The goal was to compare the ULMFiT sample efficiency to other methods. Howard and Ruder call the ULMFiT method “extremly” sample-efficient in their paper. I’ve got different results for the 10kGNAD.

To evaluate the sample sample-efficiency I trained ten models for nine subset sizes raging from 1% to 100%. I report the average error rate for the fastText library, a Support Vector Machine (SVM), a TensorFlow NN and the ULMFiT method using sub-word tokeniziation.

For the smaller subsets the TensorFlow NN has the highest sample-efficiency, for the larger subsets starting from 10% the SVM outperforms. The ULMFiT method has a higher sample-efficiency only on the 5% subset. I can’t say that the ULMFiT method is “extremly” sample-efficient on the 10kGNAD.

Keep in mind that I was quite limited in terms of GPU power, so someone might be able to find better hyperparameters than I did. Additionally experiments on one dataset are hardly representative for the german language or other languages.

I share my scripts here.

3 Likes

I didn’t share a classifier, only a pretrained model. You’d still have to finetune it and actually create the classifier. Or did you do this? Not entirely clear to me. Applying the classifier I trained on Amazon data doesn’t make sense, as it would distinguish negative from positive, but not offense from no offense (you can be very negative but not offensive, for instance).

Hi @jyr1,
So you pretained the model on Amazon reviews. Right?