As a novice in the field of ML, and having done a significant investment in building and installing my own ML rig, i found that it’s very difficult to know if my Ubuntu setup is optimal or not. There seems to be a growing interest in the concept (@SakvaUA) , so i thought i’ll start this thread and see if we can come up with something together.
To start, @anandsaha suggested to check out the Tensor flow performance benchmarks (thanks!) After having had a look, i think we need to change it a bit to use a dataset small enough and already available in scikit-learn (e.g. MNIST) instead of Imagenet… i like though their methodology part:
This script was run on the various platforms to generate the above results. High-Performance Models details techniques in the script along with examples of how to execute the script.
In order to create results that are as repeatable as possible, each test was run 5 times and then the times were averaged together. GPUs are run in their default state on the given platform. For NVIDIA® Tesla® K80 this means leaving on GPU Boost. For each test, 10 warmup steps are done and then the next 100 steps are averaged.
So what i’m thinking is that, to make the tool user-friendly, we could create a simple page where someone can:
- get information on how to setup the machine in the standard methodology (e.g. using Anaconda 3, a virtual environment where we specify the exact libs to be installed and the exact versions, no mods or tweaks on nvidia-smi, etc.)
- download a python script which they can run easily
- a simple interface where they can put in the details of their machine (processor, RAM, GPU, mobo, etc.) + the time it took the script to run (perhaps by using timeit…)
- and at the end getting a mapping of their time against pre-bechmarked times on “validated” machines with similar (or close to similar) specs
Anyone interested in working on this idea?
I’ll try to whip up a quick and dirty version in the coming period for the page, but will definitely need help from more knowledgeable ppl on the methodology and the standard python script to be run…
Feel free to let me know as well if you guys think that it’s not a feasible ask (to create the benchmark) due to some issues or un-benchmarkeable variables which i’m not aware of, so i don’t waste the time…