Deactivate built in tokens, e.g xxmaj

Is there a way of deactivating the automatic conversion of uppercase characters to lowercase in fastai’s tokenizers for language modelling? I’m trying to build a LM with SMILES strings for chemometrics in which a capital letter has a completely different meaning than a lowercase letter.


turns out what I looked for was simply to pass rules=[] in the tokenizer…