I am trying to modify
fastai to have the ability to run augmentation transforms on text data, like it does for vision. For example, arbitrarily removing a word from an input sentence.
Ideally I would want to call.
data_clas = TextClasDataBunch.from_csv(path, 'texts.csv', vocab=data_lm.train_ds.vocab, tfms=[rand_remove(p=.3, max_n=1)], bs=42)
Where in the code is the right place to do this? adding a preprocessor to
TextDataBunch and pass it to
You should change the
apply_tfms method of
TextList (more like create one). This is what is called behind the scenes when you use
.transform in the data block API.
The factory method doesn’t know of any
tfms argument, so you should also use the data block API to build your
I think I have to change it on the single Text element.
If I want to modify self.text, but regardless I’ve been doing everything on disk because it’s faster and I don’t want to retokenize during training.