I want to build a language model. To do this, I try to load data (a lot of it!) using datablock api like this:
data = TextList.from_folder("./corpus/extracted/", recurse=True).\ random_split_by_pct(0.1).\ label_for_lm().\ databunch(bs=bs)
However, I keep getting OOM errors, so I guess I need to limit my vocabulary size. However, from_folder does not seem to accept
max_vocab argument, which causes an exception. Where do I plug it in datablock API?