How to deal with multiple newlines when tokenizing (ULMFiT)

ris · February 7, 2020, 2:56am

I am using ULMFiT to fine-tune a model.
I now have a new vocabulary.

When I tokenize an input text, some of the characters are multiple newline characters in a row. As in ‘\n\n\n’
When I tokenize, this is not being recognized as three newline characters in a row. It is counted as one unknown character.
Is there a way to fix this?