ULMFiT Dutch fastai v2

Beginner_NLP · March 12, 2021, 12:26am

Hi everyone!

I’m new to fastai, as well as to deep learning in general :). I want to build a text classifier using the ULMFiT approach and I was wondering if there are any pretrained language models available for Dutch? I found one for fastai v1, but I assume I won’t be able to use that one using the new version of fastai? Thanks!

florianl · March 12, 2021, 9:21am

Hi,

using the pretrained weights from fastai1 should be possible with fastai2. At some point there was a change of the hidden dimension size but that should also be fixable: Language_model_learner not working as before?

If you have access to a GPU pretraining the weitghts is not too hard. I recently released a GitHub Repo for ULMFiT training:

If you have any question let me know.

Florian

florianl · March 12, 2021, 4:43pm

I’m improving my scripts and need some languages to experiment with … so I’ll probably train a nl language model over the weekend. I’ll keep you posted

florianl · March 13, 2021, 1:01pm

I’ve trained the dutch model. You’ll find the link in the repo. Let me know if you have any problems with it :).

Edit: i evaluated the language model on the 110k dutch book corpus and got 93,9% accuracy (only the forward model) which is better than the results in this paper https://arxiv.org/pdf/1912.09582.pdf (ULMFiT 93,8, BERT 93,0).

Danielvs · March 13, 2021, 2:37pm

heel gezellig!

Beginner_NLP · March 13, 2021, 6:09pm

Sounds great, thanks for sharing! I’m excited to try it out!