Is my GPU being used

KevinB · November 8, 2017, 2:50am

How do I tell if my GPU is being used? I have a hunch that it isn’t currently for a few reasons:

The fan isn’t starting on it.
The model is going really slow.

I can’t find a way to verify my assumption though. I looked at nvidia-smi and I don’t see python on the list which I think I would see if it were being used?

Thanks in advance for any advice!

jamesrequa · November 8, 2017, 3:27am

Check this thread

KevinB · November 8, 2017, 3:34am

Exactly what I was looking for. Thanks for pointing me to it.

KevinB · November 8, 2017, 4:51am

I am having troubles with this still, I don’t appear to have the torch.cuda.get_device_name(0) command. I tried specifically importing it, but I get the following error when I try to use it.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-35e676d96af5> in <module>()
----> 1 torch.cuda.get_device_name(0)

AttributeError: module 'torch.cuda' has no attribute 'get_device_name'

And the other thing I tried was the following line:

from torch.cuda import get_device_name

This also did not work.

All of the other torch.cuda commands seem to be working though which seems strange.

charlielee · November 8, 2017, 5:04am

Do you have cuda installed?

eg. run nvcc --version in your terminal

KevinB · November 8, 2017, 5:07am

I am pretty confident that I do. I am able to use the nvidia-smi command with no issues and I actually now that I reran nvidia-smi I do see python so I’m wondering if it’s using the gpu, but the fans just aren’t running because of the colder environment?

I get this from nvcc --version I am just going to call it good unless it seems like things are still running slowly.

Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

charlielee · November 8, 2017, 5:08am

hmm you could just keep polling, watch nvidia-smi -q -g 0 -d UTILIZATION -l, and then run a long running task on your notebook.

Also depending on your GPU, fans only spin up to keep cool. If your temps are low, the fans will be set to a lower RPM.

KevinB · November 8, 2017, 5:10am

Yeah, it has gotten colder here recently so I’m wondering if the colder weather is keeping the fans off for a longer period of time.

binarypoet · November 8, 2017, 5:19am

Try this tool?

Works great on the AWS instances

I like the command: watch --color -n1.0 gpustat --color

charlielee · November 8, 2017, 5:22am

looks like @KevinB is using the same setup as I am, Ubuntu + 1080

KevinB · November 8, 2017, 5:29am

1080 Ti, it has been great so far. The best thing about it is that I don’t have any excuses for having poor performances because of my machine so I know it’s my fault if there is an issue.

anandsaha · November 8, 2017, 9:33am

Will advise you to check the following:

import torch.cuda
if torch.cuda.is_available():
    print('PyTorch found cuda')
else:
    print('PyTorch could not find cuda')

What do you see when you run the above?

Did you follow all instructions while installing CUDA?

Did you set your CUDA_HOME env variable properly? (I am not sure if PyTorch relies on this, but it’s safe to set it I believe) On my system for e.g.:

(fastai) as@neuron:~/store/git$ echo $CUDA_HOME
/usr/local/cuda-8.0
(fastai) as@neuron:~/store/git$

–

anandsaha · November 8, 2017, 10:19am

Have you ever seen it ON?

Personal story:

For a couple of months after purchase of 1080Ti (which I had got shipped from US), I was unaware of the fact that the fan was in a non-working condition out of the box. I too was under the impression that the fan would kick in only if the GPU felt the need, so never checked it.

On testing after a couple of months, found that the fan would not trigger in any condition, and the GPU would trip if temperature exceeded 90C. Getting it replaced would have cost me 1/6th to 1/5th the price of the GPU due to shipping and customs duty (I had to bear the cost of sending it to them and paying the octroi on return - this is EVGA Taiwan).

As a hack, I opened up the top acrylic cover to expose the internal heat sinks. Then set up an arrangement of 3 fans (2 blowers and 1 exhaust) for cooling. Looks badass now.

JPG

Ekami · November 8, 2017, 3:39pm

Considering your GPU is not really used I find your temperatures being really high. For instance here is a capture of nvidia-smi of my GTX 1080Ti SCII from EVGA:

With my ambient temperature being around 15C.
I don’t know what model of GTX 1080Ti you have but keep in mind that the higher voltage you set (either by default of manually) on your card will inevitably get you higher temps.
I don’t know how to retrieve the core voltage of the GPU on linux but here are the results of the power draw in watt from the nvidia-smi -i 0 -q -d POWER command:

==============NVSMI LOG==============

Timestamp                           : Wed Nov  8 16:38:18 2017
Driver Version                      : 384.59

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Power Readings
        Power Management            : Supported
        Power Draw                  : 10.46 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 300.00 W
    Power Samples
        Duration                    : 116.01 sec
        Number of Samples           : 119
        Max                         : 11.23 W
        Min                         : 9.20 W
        Avg                         : 10.14 W

Hope it helps… somehow

Ekami · November 8, 2017, 3:44pm

Also when the card reaches a threshold temperature (which you can get with nvidia-smi -i 0 -q -d TEMPERATURE) your performances will inevitably decrease as the GPU will automatically downclock itself to not reach the “Shutdown temperature”. Here are my results:

~ ➜ nvidia-smi -i 0 -q -d TEMPERATURE

==============NVSMI LOG==============

Timestamp                           : Wed Nov  8 16:41:47 2017
Driver Version                      : 384.59

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Temperature
        GPU Current Temp            : 26 C
        GPU Shutdown Temp           : 96 C
        GPU Slowdown Temp           : 93 C
        Memory Temp                 : N/A

I already stress tested my card (which is the first thing I do when I receive a new one) and I never went above 80C even during summer when ambient temperatures were around 30C.

Ekami · November 8, 2017, 3:48pm

What you can do instead is replacing the cooling block on your card by a watercooling one. Like this one. Very clever finding this hacky solution btw but I don’t think it will hold if you use your GPU at 100% for days (well you tell me aha ).

anandsaha · November 8, 2017, 3:59pm

Haha how I wish I could fit a water cooling block

The deterrent was that I saw videos of how these are fit. It involves opening up the GPU (50 to 80 screws, big small and tiny) and exposing the chip. Given that there are no local service centers in my country, this was a very risky proposition. Very high chance of making mistakes

This solution holds to 85C most of the time, though I am making use of the AWS credits first

Ekami · November 8, 2017, 4:07pm

Oh okay, sad to hear that But 50 to 80 screws? That looks insane. Usually when you unmount a GPU you have 3 parts:

The board
The “radiators” (the metal grid)
The plastic mount (the thing on top of the metal grid)

You only need to unmount the part between the board and the meta grid to put a new one. See here (I’ve set the right timer). But well if your solution works for you then kudos for that . I’ll keep it in mind for when my own card will die.

Ekami · November 8, 2017, 4:12pm

Btw just a random thought: Maybe your fans aren’t spinning because the power connector between your GPU board and your plastic mount (where you have the fans) aren’t properly connected? (you can see on the video link above what I’m speaking about). Did you check that you used the right power stripes from your power supply on your card? Having 1 fan not working is ok, but having the 2 not working looks like a “power shortage” issue.

anandsaha · November 8, 2017, 4:30pm

Yeah lots of screws

Yes connection can be a problem. Unfortunately the one I have (EVGA 1080 Ti FE) has a blower style fan, and the connection goes inside the GPU. I will have to open all the 50 screws to check this, unfortunately.

So probably I will take a chance once this gets a bit old