Pretrain & Finetune MLM - 6: Reproduce GLUE finetuning results

To validate our finetuning script, I trained and compared with table 8 of the ELECTRA paper. And the results is, even a little bit better than paper ! :heart_eyes: :heart_eyes: :heart_eyes:

Model CoLA SST MRPC STS QQP MNLI QNLI RTE Avg.
ELECTRA-Small 54.6 89.1 83.7 80.3 88.0 79.7 87.7 60.8 78.0
ELECTRA-Small (finetuned with fastai) 52.8 89.8 84.5 83.6 88.7 80.4 88.9 65.2 79.2
ELECTRA-Small++ 55.6 91.1 84.9 84.6 88.0 81.6 88.3 63.6 79.7
  • Results on test test.
  • No ensemble, No task-specific tricks. Only choose the best one out of 10 trained models.
  • Actually it confuses me that is electra-small-discriminator on hugging face hub ELECTRA-Small or ELECTRA-Small++ ? (but ELECTRA-small result on GLUE benchmark is actually ELECTRA-Small++)

β€œPretrain MLM and fintune on GLUE with fastai”

Previous posts.

  1. MaskedLM callback and ELECTRA callback

  2. TextDataLoader - faster/as fast as, but also with sliding window, cache, and progress bar

  3. Novel Huggingface/nlp integration: train on and show_batch hf/nlp datasets

  4. Warm up & linearly decay lr shedule + discriminative lr

  5. General Multi-task learning

Also follow my twitter Richard Wang for updates of this series.

Things on their way

  • use one_cycle and fp16 to reproduce
  • Pretrain ELECTRA-small from scratch
  • ensemble and wnli tricks (maybe)
9 Likes

Great job @Richard-Wang! Thank you for sharing!

1 Like

@Richard-Wang Your dedication on this so far has been amazing!! Keep up the excellent work!

2 Likes