Tokenizing long text into sentences

I’m wondering if there are any best practices for tokenizing longer text (e.g. 2000+ characters) into sentences automatically.

Only some of the tokenizers that appear to be commonly used implement sentence tokenization:

Most other modern tokenizers, such as in the tokenizers library, do not seem to implement sentence tokenization, instead expecting that your dataset has already been split into sentences.

Am I misunderstanding the problem – i.e. are there other ways I should be handling long strings before passing them to a word- or subword-level tokenizer?

P.S. I want to thank Rachel, Jeremy, Sylvain et al. for creating an awesome DL ecosystem and community. fastai is the reason I’m working on machine learning, their work is much appreciated.