Developer chat

sgugger · November 20, 2018, 6:18pm

After experimenting a bit, and going back and forth, we finally settled on adding a MAJ token: each word that begins with a capital is lower cased (as before) but we add xxmaj in front of it to tell the model. It appears to help a little bit.
There is a new pretrained model to match that change: you’ll find it in URLs.WT103_1
The text example notebook has been updated to use it (and went from 79% to 84.5% accuracy in the process!)