I am using ULMFiT to fine-tune a model.
I now have a new vocabulary.
When I tokenize an input text, some of the characters are multiple newline characters in a row. As in ‘\n\n\n’
When I tokenize, this is not being recognized as three newline characters in a row. It is counted as one unknown character.
Is there a way to fix this?