Building a benchmark for model training times across platforms

As a novice in the field of ML, and having done a significant investment in building and installing my own ML rig, i found that it’s very difficult to know if my Ubuntu setup is optimal or not. There seems to be a growing interest in the concept (@SakvaUA) , so i thought i’ll start this thread and see if we can come up with something together.

To start, @anandsaha suggested to check out the Tensor flow performance benchmarks (thanks!) After having had a look, i think we need to change it a bit to use a dataset small enough and already available in scikit-learn (e.g. MNIST) instead of Imagenet… i like though their methodology part:

This script was run on the various platforms to generate the above results. High-Performance Models details techniques in the script along with examples of how to execute the script.

In order to create results that are as repeatable as possible, each test was run 5 times and then the times were averaged together. GPUs are run in their default state on the given platform. For NVIDIA® Tesla® K80 this means leaving on GPU Boost. For each test, 10 warmup steps are done and then the next 100 steps are averaged.

So what i’m thinking is that, to make the tool user-friendly, we could create a simple page where someone can:

  • get information on how to setup the machine in the standard methodology (e.g. using Anaconda 3, a virtual environment where we specify the exact libs to be installed and the exact versions, no mods or tweaks on nvidia-smi, etc.)
  • download a python script which they can run easily
  • a simple interface where they can put in the details of their machine (processor, RAM, GPU, mobo, etc.) + the time it took the script to run (perhaps by using timeit…)
  • and at the end getting a mapping of their time against pre-bechmarked times on “validated” machines with similar (or close to similar) specs

Anyone interested in working on this idea?

I’ll try to whip up a quick and dirty version in the coming period for the page, but will definitely need help from more knowledgeable ppl on the methodology and the standard python script to be run…

Feel free to let me know as well if you guys think that it’s not a feasible ask (to create the benchmark) due to some issues or un-benchmarkeable variables which i’m not aware of, so i don’t waste the time…


Great initiative @aragalie!

I will give the scripts a spin sometimes this week and report my observations here.

1 Like

Awesome @anandsaha, appreciate your time and efforts!

Following the example of, I figured it would be easiest to have the tool on git, inside a simple Jupyter NB :slight_smile:

My current rough ideas:

  • we could host a simple times database inside an S3 AWS (i can host it on my account) for cost effective delivery
  • we could potentially gather the system specs via CLI commands (e.g uname -a, sudo lshw -class cpu, etc.); if parsing that proves to be too much of a hassle then maybe we make a function with arguments corresponding to the various item (e.g my_processor. and then ipython shows all the available options what user can pass to it, then my_OS, my_RAM, etc.)
  • then we’ll can have a section which creates all the requirements for the unified methodology (a new conda env, install libraries, etc…) to make sure we have easy reproducibility for users
  • then we have the actual testing script part (ideally using a dataset already installed in previous step, e.g MNIST) -> user gets the time
  • then we can have a simple plotting graph showing where the user’s machine sits in the benchmark (using the data download previously)

If anyone has better ideas or has done something like this before, please share the learnings…

I’ve also set-up a slack channel in case anyone wants to chat about this

PS. I’m using all these tools for the first time, so please bear with me in case i do/say something really stupid :slight_smile:

1 Like