Hi! I would like to use Language Models for predicting characters . My input data are words (one word on each line in txt). I’ve decided to use
TextList for storing values (see code below).
data_lm = (TextList.from_df(df, cols ="text") .split_by_rand_pct(0.1, seed=101) .label_for_lm() .databunch(bs=10, num_workers=0)). I know that I need my custom preprocessing (also with tokenizer) and I don’t know how to implement one. Is there anyone who tried something like this?
My main idea is to generate words with distribution learned from training words.