I am sorry for this too late reply, I just had my whole heart set on organizing all things together.
So the result is this post (and the following releases)
It will be a series from pretraing mlm model to multi-task finetune on GLUE. You can follow my Twitter for updates of this series.
I will also be grateful if you can try the code on your gpus and with a larger corpus to see if it truely can reach the accuracy reported (which it should be) . And also help me debug the fp16 issue mentioned in the post.