ULMFiT - French


(Thomas Chambon) #1

I have worked on implementing ULMFit for the French language using fastai v1.

I have created two datasets for this task:

  • Language model: extract of Wikipedia in French (100M tokens) and a 30K vocab.
  • Classification (sentiment analysis): movie reviews using an imdb-like french website. The dataset contains 11K positives reviews, 11K negatives reviews as well as 51K unlabelled reviews for language model tuning.

My results so far:

  • Language model with 100k tokens and a 30K vocab: accuracy of 0.3570, perplexity of 24.36.
  • Classification: accuracy of 0.9349 using pretrained LM model and fine tuning with the 51K unlabelled reviews.
    Without using the pretrained LM, I still get a 0.89 accuracy if I train the LM on the 51k unlabelled reviews.

It seems there is no public benchmark for the french language. I am working on a blog post to present the model.

I am trying to contact and convince a french movie reviews website to release a labelled dataset of movies review, to create a first benchmark.

The code of the classifier and the model weights can be found on github here: https://github.com/tchambon/deepfrench


Language Model Zoo :gorilla:
Language Model Zoo :gorilla:
#2

Hi,

You could also use this dataset : https://deft.limsi.fr/2017/
It’s not movie reviews but tweets, and there are four labels ( objectif , positif , négatif ou mixte). For simplicity you could just use positive and negative samples and use their results as a baseline.

Would be very interested in seeing the results, I’m trying to do a French Model myself


(Thomas Chambon) #3

Hi,

Thank you for the dataset link!
However it seems the dataset is no more publicly available. Have you found how to get it?


(Piotr Czapla) #4

Tom Superb work! @jeremy Have you seen this 93% accuracy! :slight_smile:
It seems that we have another superb result of ULMFiT. @tomsthom we should make it public on Twitter. Can you share your tweet handle?

This is exactly what we are doing with Polish. It is going very slow :). As a last resort, we are planning to simply publish a list of urls and a way to fetch them yourself if anyone wants to verify the results.


(Piotr Czapla) #5

@claeyzre awesome finding, I’m writing to the guys from the competition to release the data so that we can compare BERT and ULMFiT.

@tomsthom can you add the dataset and the paper to the thread above and state SOTA?

Btw. @jeremy can you make the first thread wiki, we don’t have the permission to do that even as authors of the post.


(Thomas Chambon) #6

my twitter handle is @tomsthom

Claeyzre sent me the dataset and I will train it to see the performance with ULMFiT and BERT.


(Piotr Czapla) #7

@Claeyzre, can you share the data set as well, do you have the license for it? I would like to see how good BERT on text classification.


#8

To be honest, I found it on a random gitlab page here. I don’t know if there is any license attached to it :confused:


(Vishwa Nath Jha) #9

I’ve been struggling to do the same thing for #HINDI using fastai v1. Any link to the code for references. Have been able to build the language model however, facing issues in transferring it for text classification. Much appreciated.


(Thomas Chambon) #11

I will share the code on github very soon, I need to do some cleaning before.
If you need it urgently, I can send you a working version as an example (but not cleaned).


(Vishwa Nath Jha) #12

Working version would help to begin with. I’ll push my code to github as soon as it’s successfully tested. Thanks again!


(Vishwa Nath Jha) #13

Hi @tomsthom! Looking forward to your revert. Thanks!


(Thomas Chambon) #14

I sent you a link with the notebook by PM.
After having cleaned and modified the code to take into account the last update of fastaiV1, I will publish it on github.


(Vishwa Nath Jha) #15

Thanks a ton @tomsthom ! Let me work this out for Hindi and i’ll post my results for it. Keeping fingers crossed.


(Quentin Retourne) #16

@tomsthom I would love to see the code as well. Please let us know when it is available on GitHub :slight_smile:


(Vintila Claudiu) #17

Nice work guys!
Would love to test your model or see the code if you are willing to share it.

I am currently trying to train a typo correction tool based on SymSpell and I am looking for a sound french corpus. Any suggestions?


(Thomas Chambon) #18

I will publish the code this week. It will work with the last changes of fastai v1.

For french corpus, the simplest is to start with wikipedia FR (more information on how to extract in the LM zoo topic: Language Model Zoo 🦍).


(Thomas Chambon) #19

@piotr.czapla did you get an answer about the data of the DEFT competition?

Running ULMFiT on the 4 class tweet classification, I can get a macro fscore around 0.54 easily (could be improved with more hyper parameters tuning).
The results I have seen of the competition (https://deft.limsi.fr/2017/actes_DEFT_2017.pdf#page=107) show a best macro fscore of 0.276. It would be a huge improvement of SOTA!
But we have to confirm we have the correct data (since it comes from a non official github repo) and that this PDF show the best official competition results.


Study Group in French
(Piotr Czapla) #20

I haven’t sent a request to them as @claeyzre found the data so we can train and see. But Indeed it would be good to double check with them if they are okey with us using their data.
You have superb results if we haven’t make a mistake. the F1 is tricky as there are different incompatible implementation of F1 micro. For example, scikit calculate F1 differently than it is described on wikipedia pages what is worse the results differs a lot.

For German I’ve calculated the F1 by hand using the data from the paper to reverse engineer the formula used in competition and then I’ve implemented a F1 calculation for my scripts using numpy.
I think we can do the same for the DEFT paper. 0.5 would be amazing result.

Please share the code. How about integrating into ulmfit-multilingual?


(Thomas Chambon) #21

Ok, I will contact the guys from the competition to try to get their approval and the official data/results .

For F1 macro (it seems to be the metric of the competition, not F1 micro), I used two different implementations: a sklearn based one and a custom one I have coded (using the wikipedia formulas). And as you said, the results are not identical (I think this is because, when one class is not predicted, sklearn uses a fscore of 0, which lower the result in F1 macro) but very close (0.54 is the sklearn result, my custom implementation gives a slightly better score).
This is the sklearn based implementation that gave me 0.54:

  def f1_sklearn(y_pred:Tensor, y_true:Tensor):

    y_pred = y_pred.max(1)[1]

    res = f1_score(y_true, y_pred, average='macro')
    return Tensor([res]).float().squeeze()

I should be able to share the full code next Monday.
Yes it’s a good idea to integrate it on ulmfit-multilingual, as it’s used a lot. There is already a fbeta metric in fastai, but it does not manage multiclass (with macro or micro score).