Training the language model from scratch.
Using the following code to create a databunch for ~60k train+test set. Always gets the ‘session crashed because the consumption of all RAM memory’. Do we have an iterative solution to create a data bunch in chunks?
data_lm = TextLMDataBunch.from_folder(path='./HindiDataset/', tokenizer=tokenizer, vocab=hindi_vocab)
Using sentencePiece for tokenization.