Lesson 12 (2019) discussion and wiki

I’ve been using ULMFiT for genomics data. You need to write your own processing functions and train your own language models. I needed to write custom functions for:

Tokenizer
Vocab
NumericalizeProcessor
_join_texts
TokenizeProcessor
NumericalizeProcessor
_get_processor
TextLMDataBunch
TextClasDataBunch

But really that’s stuff for turning your data into a form you can feed into the model. Everything that happens after you tokenize/numericalize your data is the same. Same AWD-LSTM model, same ULMFiT training process.

13 Likes