@cedric You should take a look at clipper.ai which @QWERTY1 recently shared. It’s out of the Berkeley RISE lab and is a very well thought out framework for model serving as an API. The website doesn’t really do the framework justice in my mind and the videos are definitely worth looking at. It’s very similar to what you’ve layed out, but has a few more details outlined. It looks like you’ve thought of some other aspects as well so it may be worthwhile joining forces and contributing your ideas/work.
I’m currently trying to convince my company to adopt it for model serving so that we can work on it and help improve it but so far I’ve been very impressed with what it does and their roadmap.