To validate our finetuning script, I trained and compared with table 8 of the ELECTRA paper. And the results is, even a little bit better than paper !
Model | CoLA | SST | MRPC | STS | QQP | MNLI | QNLI | RTE | Avg. |
---|---|---|---|---|---|---|---|---|---|
ELECTRA-Small | 54.6 | 89.1 | 83.7 | 80.3 | 88.0 | 79.7 | 87.7 | 60.8 | 78.0 |
ELECTRA-Small (finetuned with fastai) | 52.8 | 89.8 | 84.5 | 83.6 | 88.7 | 80.4 | 88.9 | 65.2 | 79.2 |
ELECTRA-Small++ | 55.6 | 91.1 | 84.9 | 84.6 | 88.0 | 81.6 | 88.3 | 63.6 | 79.7 |
- Results on test test.
- No ensemble, No task-specific tricks. Only choose the best one out of 10 trained models.
- Actually it confuses me that is electra-small-discriminator on hugging face hub ELECTRA-Small or ELECTRA-Small++ ? (but ELECTRA-small result on GLUE benchmark is actually ELECTRA-Small++)
βPretrain MLM and fintune on GLUE with fastaiβ
Previous posts.
-
TextDataLoader - faster/as fast as, but also with sliding window, cache, and progress bar
-
Novel Huggingface/nlp integration: train on and show_batch hf/nlp datasets
Also follow my twitter Richard Wang for updates of this series.
Things on their way
- use one_cycle and fp16 to reproduce
- Pretrain ELECTRA-small from scratch
- ensemble and wnli tricks (maybe)