Assuming you are using fast.ai, the TextDataBunch
will contain your preprocessed text. You can then use the supplied save
and load_data
methods.
You may need to experiment with the different methods of creating the TextDataBunch
although from_folder
sounds like the most relevant. A chunksize
for the Tokenizer and Numericalizer (processors) can be specified as a parameter.