How to save the model in a Database Table

Hi,

I was wondering if there are methods to save the trained keras/ML model in a database table so that at predict time it is faster to retrieve the model for prediction ?

Faster as in taking less wall clock or CPU or something like that? Maybe I misunderstand your idea, but just putting something in a (relational?) database is likely to slow reads rather than speed them up (because there are overheads over just reading straight from disk or cache, such as imposed by providing facilities like relational consistency).

The following is based on experience of programming but not (yet) deep learning:

The most obvious way to make inference faster if loading the model is the bottleneck is to ensure the model is already in memory when the starting pistol is fired, so to speak. If that’s on CPU and therefore regular system RAM, there might be multiple ways to achieve that: e.g.

  1. Just start up your program and have it read the model right away when it starts up, and then just don’t exit. For example it could respond to HTTP requests, or just work through a queue of inference requests - a database is one reasonable way to queue up requests)

  2. Mount a ‘ram disk’ and read your model from there.

  3. Ensure your operating system is caching the data. It’s quite likely you’ll find the second and subsequent runs of your program are faster for this reason unless you explicitly flush OS-level caches (e.g. via fiddling with certain special files in the /proc filesystem on Linux).

@lateralplacket Thanks for your response. May be more context helps here.

When there are multiple models (scalability) and if we are dealing with filesystem to store our trained models instead of database table then there is usually a delay to read from filesystem at prediction time.
Saving time can be discarded as it doesn’t impact much in real scenario since it is offline.
Concerned about loading the saved model at prediction time which is very critical.
Loading from database table since it is transactional helps.

Database tables are implemented in the end as disk files, with caching in memory. So, if you were to pick a database system, load your models into it and found it sped up your predictions, what would you conclude from that?

If it’s faster, that’s likely because it got cached in RAM by some means or other (i.e. it got loaded into RAM, and stayed there for later reuse). So, I would advise to skip the database part and just load the model into memory explicitly (are there multiple models? I can’t tell whether you’re talking about training or prediction there). I don’t know why you say “Loading from database table since it is transactional helps”, but unless I’ve badly misunderstood something, you’re wrong about that, because transactions cost extra CPU work, memory usage, and latency (potential for having to wait for other transactions).

Again, that’s without experience yet of deploying any real deep learning inference code, so I don’t know in detail how things pan out practically with common frameworks etc.

Why not give it a try and see: write a very simple Python script that does model = load_my_model() at the start, and then just has a hardcoded Python list of data (filenames? raw data? whatever you have) to do your predictions on. Loop through the list and predict every item in the list. Time 3 runs of that, take the best time of those three. Is that fast enough? If so, problem solved.

If you’re considering doing your predictions on GPU, take a look at this page for a lot of detailed information about bottlenecks - there is even some discussion about pros and cons of CPU vs. GPU for different inference workloads.

Yes there are 10+ models

So how big are they? Maybe you can afford to just load all 10+ into RAM at once?

Again I don’t know if you plan to use GPU for prediction, though: if so, things are different

Right now its only CPU but plans of GPU in the future as well. The question is what does database table storage of models bring in (as per earlier posts seems not much use) instead of keeping all these pretrained models in a directory as json or binary h5 files.

Sounds like you want to set up a separate server for each model and then just query that whenever you want to predict something.

I’m a python novice, but I was able to reconfigure this script to make a simple server for one of my projects: https://daanlenaerts.com/blog/2015/06/03/create-a-simple-http-server-with-python-3/

I believe you should read https://www.tensorflow.org/deploy/tfserve
You can reduce size of models in several times and increase speed when transform initial keras model to optimized version for TensorFlow. It is quite useful step even if you will not use TensorFlow Model Server later

1 Like

Are you referring to an automated transformation or a manual one?

Manual
Useful article: https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc

2 Likes