Interesting discussion: https://github.com/google-research/bert/issues/66
Use the max sequence length 512. That got me to 95% accuracy with Bert base uncased model.
Iāve reduced my dataset in order to include only the documents up to 512, but the accuracy got worse: 51%. I suspect data reduction led to overfitting.
BTW, I havenāt read BERTās source code yet, so Iād like to know if BERT automatically ādividesā documents longer than max sequence length, using the same label. Or should I do that manually?
An update: I added a preprocessing step to break the documentos in chunks with at most 512 words (despite knowing BERT considers SentencePiece tokenization) and my modelās accuracy jumped to 76%!
Have you been using BERT with fastai using the huggingface port to pytorch? If so, are you able to offer any insights about how you were able to get it working?
No, Iāve been using BERTās āoriginalā Google version. No pytorch yet.
Unsupervised Data Augmentation is the best reported performance using BERT for text classification.