ULMFIT - Hungarian

Continuing the discussion from Language Model Zoo :gorilla::

Starting this thread to share the progress on the Hungarian Language Model (aka ‘Lángos’ ) and classification results.

repository: github/pmamico/langos

Current results

Wiki based Language Model

  • Perplexity of language model: ~26

Sentiment Classifier

  • Accuracy of classification model: ~89%

Dataset

Pretrained Language Model

Download pretrained Language Model from here and the vocab here

Classifier

Download classifier from here and the encoder here

Next steps

  • Classifier tuning
    Goal on sentiment classification is 93% accuracy!
  • LM fine-tuning
    In the future I plan to fine-tune the Wiki LM on classical literature.
    For this task I have 75 books from BME Corpus Project (see here), and the goal is 500.
    If you have any classical books in plain-text utf8, that can helps to achieve!