Google Colab: session crash while creating a data bunch

Training the language model from scratch.
Using the following code to create a databunch for ~60k train+test set. Always gets the ‘session crashed because the consumption of all RAM memory’. Do we have an iterative solution to create a data bunch in chunks?

data_lm = TextLMDataBunch.from_folder(path='./HindiDataset/', tokenizer=tokenizer, vocab=hindi_vocab)

Using sentencePiece for tokenization.

1 Like