ULMFiT - Chinese (Simplified) in progress

piotr.czapla · November 7, 2018, 9:21pm

Exactly!, SP is by google as well and seems better suited than wordpiece for Chinese as it does not need spaces in the input sentence. I will try to find out more about this.

We can as well try QRNN it is faster architecture than LSTM and gives similar results. Given that we have BERT released for Chinese, the speed might be a quite important feature of ULMFiT.

Btw. Sebastian Ruder is working on moving ULMFiT to fastaiv1 and he want’s to add sp tokenization and clean up api. Maybe you want to join the effort. If so have a look here: Multilingual ULMFiT - #5 by piotr.czapla