How to productionize your models?

Let’s say I’ve trained a language model with decent perplexity (below 45 or so) and it proved to be performing very well when finetuned for text classification of one dataset. Now I want to “package” it as a python library for people to use in the simplest way possible. It should feel like when you’re using fastText or nltk to do classifiction, just:

  1. pip install leLibrary
  2. In python, load the model and embeddings
  3. Specify paths to dataset
  4. Sit and wait for it to train
  5. Trained model gives out predictions for validation

All these without having too many dependencies such as pytorch and because I figured having to download about 700M of embeddings is in itself an annoying step to take.

Really appreciate your suggestions!

The model I’m trying to productionize: Language Model in Thai