Show GPU utilization metrics inside training loop (without subprocess call!)

marcmuc · October 17, 2018, 10:15am

When I am running training I always run watch -n1 nvidia-smi, (or thanks to Jeremy’s hint now nvidia-smi dmon).

I would love to see that integrated in the fastai training loop, but obviously using subprocess with nvidia-smi would be less than ideal so I researched this a bit.

nvida-smi apparently is only a wrapper around the nvml c-lib (NVIDIA Management Library). nvidia provides python bindings for that lib and a demo smi.py , so we can directly call this from within python, no need to run nvidia smi as a system process.

so pip install nvidia-ml-py3 and then (quick example):

import nvidia_smi

nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate

res = nvidia_smi.nvmlDeviceGetUtilizationRates(handle)
print(f'gpu: {res.gpu}%, gpu-mem: {res.memory}%')

This could be integrated into the fastai library itself (if this is of interest I can give it a try) or of course could be used in callbacks etc. individually.

Just wanted to bring it to the attention of people that this is possible.

Additional info:
The original nvidia version (pip install nvidia-ml-py) only supports python2, the version used above (nvidia-ml-py3) is a patch by a user, see github)

jeremy · October 17, 2018, 1:24pm

Very cool - that’s a new one to me. I think you could create some pretty cool callback with this! e.g. extra statistics for the dynamic graph callback to show during training…

maxim.pechyonkin · October 17, 2018, 1:33pm

This is a very useful tip. Thanks!

poppingtonic · October 17, 2018, 1:36pm

Goodbye watch nvidia-smi in a separate tmux pane!

antorsae · October 17, 2018, 2:03pm

FWIW this is what I now use instead of watch -n1 nvidia-smi:

but I agree that callback approach would be great, leading to the potential maxbs_find to find the biggest batch size you can fit in your model.

albanie · October 18, 2018, 12:12am

Possibly worth noting that if you want to see that the same numbers that nvidia-smi reports, you might want to use:

mem_res = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print(f'mem: {mem_res.used / (1024**2)} (GiB)') # usage in GiB
print(f'mem: {100 * (mem_res.used / mem_res.total):.3f}%') # percentage usage

rather than

res = nvidia_smi.nvmlDeviceGetUtilizationRates(handle)
print(f'gpu: {res.gpu}%, gpu-mem: {res.memory}%')

They report slightly different numbers - there is a discussion here with the details: https://github.com/NVIDIA/nvidia-docker/issues/220

mobius · October 18, 2018, 7:41am

Also netdata (https://github.com/netdata/netdata) produces some useful graphs on nv usage.

maxim.pechyonkin · October 18, 2018, 2:37pm

How do you get those graphs? Is it a third-party application?

stas · October 19, 2018, 12:13am

It would be wonderful, but we would need it to be on conda as well, and currently I see only nvidia-ml-py on conda. If you could ask that person who ported it to put it on conda (not conda-forge) that would be helpful.

It’s pretty easy to port pypi’s setup.py to conda: this will create the recipe for conda-build.

conda skeleton pypi nvidia-ml-py3

full explaination is here

But I see neither pypi has a wheel built, it’s just a tarball.

It’s a pure python module, correct? despite using ctypes

So it should then be noarch for the conda-package.

And it doesn’t seem to work via main:

$ python nvidia_smi.py
Traceback (most recent call last):
  File "nvidia_smi.py", line 873, in <module>
    print(XmlDeviceQuery())
  File "nvidia_smi.py", line 228, in XmlDeviceQuery
    strResult += '    <product_name>' + nvmlDeviceGetName(handle) + '</product_name>\n'
TypeError: must be str, not bytes

mobius · October 19, 2018, 6:08am

It’s netdata (see the link in my previous post) that actually produces these graphs. It’s a linux daemon that creates a small web server containing the aforementioned graphs which are usually from per second metrics. It also creates graphs for pretty much everything on a linux box, (cpu, memory, iowait etc)

jeremy · October 19, 2018, 12:17pm

That’s an easy fix (just use an py36 f-string). Since this module doesn’t seem very efficient or updated, we’d probably want to fork it (license permitting) and maintain our own version and conda/pip packages.

marcmuc · October 19, 2018, 12:22pm

That’s what I would have thought, and only take the parts that are needed. It’s BSD Licenced, so the copyright notice must be reproduced in the code/docs. PyPI lists this in the official NVIDIA version:

COPYRIGHT

LICENSE

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the NVIDIA Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

plus the usual ‘as-is’ disclaimer which I ommited here…

stas · October 23, 2018, 10:57pm

This and gputil are now documented here:
https://docs.fast.ai/dev/gpu.html#accessing-nvidia-gpu-info-programmatically.