ULMFIT - Punjabi

(Gaurav) #1

Starting this thread to share the progress on the Punjabi LM and classification results @piotr.czapla @Moody

Repository: NLP for Punjabi

Datasets:

  • Download Wikipedia Articles Dataset (44,000 articles) which I scraped, cleaned and trained model on from here
  • Checkout BBC Punjabi News dataset which I scraped, cleaned and trained model on from here

Results:

Perplexity of Language Model: ~13 (on 20% validation set)

Kappa Score of classification model: ~60

Accuracy of classification model: 89%

The above results for classification have been obtained on validation set which had ~84% negatives and ~16% positives.

Pretrained Language Model

Download pretrained Language Model from here

Classifier

Download classifier from here

Tokenizer

Unsupervised training using Google’s sentencepiece

Download the trained model and vocabulary from here

0 Likes

Language Model Zoo :gorilla:
Language Model Zoo :gorilla:
(Piotr Czapla) #2

HI Gaurav, good work! What accuracy have you got? I want to compare the results to the Laser results on MLDoc

0 Likes

(Gaurav) #3

Accuracy would have been a wrong metric with the above dataset, as it was highly unbalanced, with

114 Positive Examples
670 Negative Examples

Hence, I calculated Kappa Score (~49) and didn’t calculate accuracy.

0 Likes

(Piotr Czapla) #4

I see, although we use accuracy for our evaluations, maybe you can cut out a balanced test data set?

Do you have any similar corpus that would have sentences with sentiment. It does not have to have labels, Tweets would be fine, or product reviews / comments.
If so you could finetune LM on that data set and you should get much better results. It would be interesting to see how much you can improve.

0 Likes

(Gaurav) #5

yes sure, I’ll do this and report.

Unfortunately no. :frowning: But I’ll check again if I can get/scrape a balanced/better dataset from somewhere!

1 Like

(Gaurav) #6

Hey, I’ve the notebook and github repo to reflect that the above results [89% accuracy and ~60 kappa score] for classification have been obtained on validation set which had ~84% negatives and ~16% positives. Do you think that would be helpful while ensuring reproducibility?

0 Likes