I’m new to fastai, as well as to deep learning in general :). I want to build a text classifier using the ULMFiT approach and I was wondering if there are any pretrained language models available for Dutch? I found one for fastai v1, but I assume I won’t be able to use that one using the new version of fastai? Thanks!
using the pretrained weights from fastai1 should be possible with fastai2. At some point there was a change of the hidden dimension size but that should also be fixable: Language_model_learner not working as before?
If you have access to a GPU pretraining the weitghts is not too hard. I recently released a GitHub Repo for ULMFiT training:
I’m improving my scripts and need some languages to experiment with … so I’ll probably train a nl language model over the weekend. I’ll keep you posted
I’ve trained the dutch model. You’ll find the link in the repo. Let me know if you have any problems with it :).
Edit: i evaluated the language model on the 110k dutch book corpus and got 93,9% accuracy (only the forward model) which is better than the results in this paper https://arxiv.org/pdf/1912.09582.pdf (ULMFiT 93,8, BERT 93,0).