Algorithmic data generation for language model

Has anyone done anything with algorithmic generation of language model data? I have had a lot of success with using transforms to generate augmented data for text classification, but not for training the language models themselves.