Google BERT Language Models

Interesting discussion: https://github.com/google-research/bert/issues/66

2 Likes

Use the max sequence length 512. That got me to 95% accuracy with Bert base uncased model.

2 Likes

Iā€™ve reduced my dataset in order to include only the documents up to 512, but the accuracy got worse: 51%. I suspect data reduction led to overfitting.

BTW, I havenā€™t read BERTā€™s source code yet, so Iā€™d like to know if BERT automatically ā€œdividesā€ documents longer than max sequence length, using the same label. Or should I do that manually?

An update: I added a preprocessing step to break the documentos in chunks with at most 512 words (despite knowing BERT considers SentencePiece tokenization) and my modelā€™s accuracy jumped to 76%!

2 Likes

Have you been using BERT with fastai using the huggingface port to pytorch? If so, are you able to offer any insights about how you were able to get it working?

1 Like

No, Iā€™ve been using BERTā€™s ā€œoriginalā€ Google version. No pytorch yet.

1 Like

Unsupervised Data Augmentation is the best reported performance using BERT for text classification.