[ EDIT 10/20/2019 ] I’ve updated the notebook lm3-french-classifier-amazon.ipynb (nbviewer of the notebook) with the right code to use the SentencePiece model and vocab trained for the general LM into the specialized one and the classifier (see explanations at the top of the notebook).
(MultiFiT) French Bidirectional Language Model (FBLM)
As edited in this previous post, I’ve finally trained a third French Bidirectional Language Model with the MultiFiT configuration. This LM performs better.
Architecture, training method and performance
You will find the notebook lm3-french.ipynb of the model training and the link to download model parameters and vocab in my Language Models github repository.
The architecture used for this FBLM is 4 QRNN with 1550 hidden parameters by layer. The SentencePiece tokenizer (15 000 tokens) was used instead of the spaCy one.
Both the forward and backward models were trained on 10 epochs with a batch size of 50.
As the FBLM was trained on a big corpus of 100 millions tokens (extraction from the French wikipedia download of about 500 millions tokens), I did not needed much regularization. Therefore, I set mult_drop to zero (no dropout) and kept the default fastai weight decay of 0.01.
For the training, I used one NVIDIA GPU v100 on GCP.
accuracy | perplexity | training time | |
---|---|---|---|
forward | 39.68% | 21.76 | 8h |
backward | 43.67% | 22.16 | 8h |
PS: the training times given in the table are the sum of fastai Databunch creation time + model training time on 10 epochs.
Sentiment Classifier on the “French Amazon Customer Reviews”
Finally, I fine-tuned this French Bidirectional LM to train a Sentiment Classifier on the “French Amazon Customer Reviews” dataset (see notebook lm3-french-classifier-amazon.ipynb (nbviewer of the notebook)).
The performance of my MultiFiT SentencePiece (15 000 tokens) French Classifier is similar to the ULMFiT spaCy (60 000 tokens) one.
If you take in count the fact that the training time of the FBLM using the MultiFiT configuration was faster of 5 hours than the one using the ULMFiT one and that the FBLM performance was better as well (see comparaison tables in this page), it means that using the MultiFiT configuration is a good choice for fine-tuning a Sentiment Classifier with a Language Model.
Language Models | accuracy | f1 score |
---|---|---|
MultiFiT with 4 QRNN + SentencePiece (15 000 tokens) | 95.74% | 0.9758 |
ULMFiT with 3 QRNN + SentencePiece (15 000 tokens) | 93.70% | 0.9636 |
ULMFiT with 3 AWD-LSTM + spaCy (60 000 tokens) | 95.97% | 0.9770 |
Bonus: I love the function show_intrinsic_attention() that allows to visualize the words that have contributed the most to the decision of the classifier. An example below with a French product review