Let’s say I’ve trained a language model with decent perplexity (below 45 or so) and it proved to be performing very well when finetuned for text classification of one dataset. Now I want to “package” it as a python library for people to use in the simplest way possible. It should feel like when you’re using fastText or nltk to do classifiction, just:
- pip install leLibrary
- In python, load the model and embeddings
- Specify paths to dataset
- Sit and wait for it to train
- Trained model gives out predictions for validation
All these without having too many dependencies such as pytorch and fast.ai because I figured having to download about 700M of embeddings is in itself an annoying step to take.
Really appreciate your suggestions!
The model I’m trying to productionize: Language Model in Thai