That is an interesting question and would be interesting to research. I do not have an answer myself.
nvidia-smi -l to see what my GPU is up to but this only gives you basic information. I think that newer versions of keras started preallocating all the GPU memory, so it doesn’t tell you much, but with torch you can see how much GPU memory is utilized by your process, that can be quite handy.
There are also more considerations that go into the batch size, so making it a heuristic to always stick as many training examples into a batch might not be ideal. Still, I think for this purpose - to have a cursory glance at model size in GPU given batch size and other parameters - nvida-smi seems to work quite nicely for torch at least based on my 2 hrs or so of experience with torch thus far
Shows GPU memory used (you can increase batch size if not fully utilized), GPU utilisation and the processes using the GPU.
nvidia-settings -q GPUCurrentClockFreqs
Shows the frequency with which the GPU is operating. Should be at peak spec speed when training (steps down if GPU heats up beyond design threshold)
Mine was running at 1.7 GHz:
Yeah, basically I want my nvidia-smi in jupityr. With some value stored for future reference.
I remember seeing other posts where utilization was recorded into a nice graph with Keras. But haven’t had a chance to find that yet.
You could use
watch -n 1 nvidia-smi so that the usage statistics are refreshed every 1 second.
I think that newer versions of keras started preallocating all the GPU memory, so it doesn’t tell you much, but with torch you can see how much GPU memory is utilized by your process, that can be quite handy.
Yes that was a refreshing change with PyTorch. Keras (and I think it is tinkering down from Tensorflow backend) would grab the entire GPU memory by default and you would not know what is the actual usage.
Also, Tensorflow’s default setting is allocating full GPU memory during the run. Hence, with Keras and TF as backend, you’re most likely to see ~100% memory being allocated. This default behavior can be changed by
@jeremy’s tip here ( Tip: Clear tensorflow GPU memory).
PyTorch is nicer on this aspect. Allocating just enough memory as it needs.
@anandsaha just beat me to my comment !
nvidia-smi dmon is much more helpful for seeing how well your GPU is utilized. Look the the ‘sm’ column there.
Let’s try to stick to discussing the modules we’re using in this course - i.e. Pytorch - since otherwise it’ll get pretty confusing!
Can someone let me know their modules and driver versions…
I have downloaded them thrice but they are always incompatible as reported by the installer.(cuda)
I am not sure what you are looking for. But if you want to install CUDA Driver - Start here:
Choose your OS, Architecture (x86_64 mostly) and you will get the file to download.
For Pytorch installation go to
pytorch.org Most likely you might want the Cuda 8.0
fast.ai, you will
git clone https://github.com/fastai/fastai.git. But since it will be updated with new material before each class, you will need to do a
git pull each time.
These are very general pointers. Hope they are useful.
The problem is fixed now…
Stumbled upon that post it seems. Here is the notebook which does it (plots gpu utilization in notebook itself.).
This file has been truncated.
"# Improving the speed of augmented data training using Keras 2\n",
"As a complete data science newbie, I decided that it would be helpful to use the latest and greatest Anaconda/Python/Keras/cudnn rather than the official part 1 environment (Python 2.7/Keras 1/cudnn7). \n",
"The switch is fairly straightforward with Keras 2 providing the most headaches (they changed the API without providing a complete backward compatible one). \n",
"In this notebook, I am using Python 3 versions of Jeremy's utils.py and vgg16_bn.py. I'm currently about halfway the course and \"ported\" the first three lessons - my utils.py might still have incompatibilities I'm not aware of. \n",
"## Rationale for this experiment\n",
"I started this course using an old Z800 (24 GB memory, 2x6 2.8 Ghz cores) with a GTX 1080Ti graphics cards. Given that the GPU is one of the fastest ones around, I hoped to get fast training speeds - but alas, performance was disappointing. I switched to a Z640 (32 GB memory, 4 3.6 GHz cores) and I got improved performance - but I still did not understand why as my old Z800 should have done a good enough job. \n",
"As Jeremy encourages experiments, I decided to figure out what the fundamental problem was. This notebook summarizes the problem and solution; I used my Z640, but I don't doubt that I would have been able to get similar performance on the 'old' Z800. \n",
"## Cats vs dogs without data augmentation\n",
Awesome thanks for finding that!
I like using this:
watch -n 0.5 nvidia-smi
I run this in a tmux pane and it automatically refreshes the output nvidia-smi
watch is a great. You can also do:
watch -n 1 free -m to track the regular memory usage while running a model…
cool. put it in another tmux pane. beautiful!
I am using this to glance at the cpu/gpu utilization from desktop.
If i have 3 gpus how do i tell it to use gpu 2 in pytorch?