I want to build a language model. To do this, I try to load data (a lot of it!) using datablock api like this:
data = TextList.from_folder("./corpus/extracted/", recurse=True).\
random_split_by_pct(0.1).\
label_for_lm().\
databunch(bs=bs)
However, I keep getting OOM errors, so I guess I need to limit my vocabulary size. However, from_folder does not seem to accept max_vocab
argument, which causes an exception. Where do I plug it in datablock API?