Show_install(0) Cuda Issues

I integrated your suggestions and some from the article you linked. Thank you, @suvash.

1 Like

@stas Iā€™m not very sure if this is a good suggestion: I use a ā€œhackyā€ script to send me a notification whenever my GPU load < 10% using Telypth bot so that I can setup another train task incase a long run finished.

Thanks, @init_27. I am thinking more about about various programmatic solutions to identifying GPU states and needs. What users can do with that information once itā€™s acquired is vast, so the latter type probably belong to the forums. Iā€™m sure some people will find your script useful, therefore please donā€™t hesitate to share.

Got it.

Thanks @stas , hereā€™s the code dump: I keep it running in a jupyter notebook in another tab. I could write a bash script but this is the lazy option:

import time
def get_usage():
    a = !nvidia-smi --query-gpu=utilization.gpu --format=csv
    return(int(a[1].replace("%","")))
    

def notify():
    !telepyth -t <token_here> "GPU IDLE!


while(1):
    if get_usage() < 10:
        time.sleep(600)
        if get_usage() < 10:
            notify()
1 Like

Thereā€™s a Python library also named gpustat which pretty much does the same ā€¦

3 Likes

@stas Iā€™ve realised that the only thing Iā€™m watching the most is the GPU (core) usage % and GPU memory usage %. I have a little nvmon script on my path, because I canā€™t remember/type out that command at all.

This could be a helpful little script (which in turn creates the nvmon script) to sneak in the image creation or instance/os bootstrapping process.

#!/usr/bin/env bash

set -euo pipefail

# Nvidia monitor script
NVIDIA_MONITOR_SCRIPT="/usr/local/bin/nvmon"

echo "Writing the nvmon script at $NVIDIA_MONITOR_SCRIPT"

cat <<EOF | sudo tee $NVIDIA_MONITOR_SCRIPT
#!/usr/bin/env bash

nvidia-smi --query-gpu=pstate,utilization.gpu,utilization.memory --format=csv -l 1 
EOF

sudo chmod +x $NVIDIA_MONITOR_SCRIPT
echo "$NVIDIA_MONITOR_SCRIPT is now copied in place"
1 Like

Me too - which is exactly what nvidia-smi dmon does, isnā€™t it?

1 Like

yep, and a couple of more things.

but, Iā€™m sure once my brain is trained to look at sm and mem columns, I can probably ignore everything else. Maybe I shouldnā€™t fight the dmon and just learn to where to look at.

was just checking man nvidia-smi and realised that I could just s elect the u tilization group to be monitored. Iā€™ll try to remember that. No need for more wacky scripts then. nvidia-smi dmon -s u

1 Like

I updated https://docs.fast.ai/dev/gpu.html with @suvash and @ecdridā€™s contributions - thank you.

5 Likes

Please give the good link, I canā€™t see the page ā€œSite not foundā€

https://docs.fast.ai/dev/gpu.html

Looks like they moved it to here.

1 Like