Hardware Bottlenecks and Training Speeds

It is taking quite a lengthy amount of time for me to train the models, more so than what I see in the course-v3 notebooks. (It will take a few minutes for training something like a classifier, while it may only take seconds in the notebook. And it will take an hour per epoch for training a multi-label classifier, or unet learner as opposed to the few minutes as displayed in the notebooks).

Relevant Specs:
CPU: i5 7600k @ 3.8GHz (No Hyperthreading)
GPU: GTX 1070
GPU2: GTX 1060 3GB
RAM: G.Skill 16GB
HDD: Seagate Barracuda ST1000DM003-1CH162 1 TB
SSD: Samsung 850 Evo 256 GB

When Training:
CPU Usage: 100%
1070 Mem Usage: 6%
1060 Mem Usage: 3%
Ram Mem Usage: Usually at 60-70%.

I’m running Debian Stretch on my HDD, and Win10 on my SSD. I’m not sure if I should place the data on my SSD instead, or maybe upgrade my CPU to have multi-threading. Is my CPU bottlenecking my GPU? If so, is there anything I can do to fix this? Also, I would like to know how I can train another learner on my second GPU.

UPDATE:
Training took a long time because my system isn’t configured properly. Debian Stretch only supports the Nvidia drivers up to 390.87, and I was training without a GPU for all that time. I’ve tried using cuda-9.0 and cuda8.0, and running:

python -c 'import fastai.utils.collect_env; fastai.utils.collect_env.show_install(1)'

Tells me that I have

=== Hardware === 
nvidia gpus   : 2
Have 2 GPU(s), but torch can't use them (check nvidia driver) 

UPDATE:
I’ll try switching my OS to the latest Ubuntu installation and hopefully that would be the end of it.

Latest:
I switched my system to Ubuntu 18.10 instead, I’ll try running the notebooks when I get back from work.

I’m no expert but sounds like it’s not using the GPU. My 1070ti can hit up to 60-70% memusage running the first notebook and my times are in line with the notebook (accounting for the differences of course.)

Try running nvidia-smi -l during a run to see if your GPU is being utilized at all?

My GPU is only using about 6 Watts or so and using about 209-227MB of memory. I’m not sure if I’m running this properly. The only process that it’s running is Xorg, and I don’t really see any process other than that. I’m not sure how I would force my GPU to be used instead, if it was just the CPU running.

At the bottom of your nvidia-smi output you should definitely see your python process using the gpu … since you don’t, it appears you are not using the gpu.

The Troubleshooting section in the fastai docs site has info on how to check whether you’re using your GPU or not.

Hi,

Google search seems to say that it is possible to install new nvidia gpu drivers on Debian Stretch, eg in this link. Have you tried that? Note that if your BIOS/UEFI has SecureBoot enabled, it will be more difficult to update/upgrade the drivers due to the need for SecureBoot key enrollment before you can install new nvidia drivers. Good luck!

Yijin

Sorry for replying a little late, But yes I did try to install the nvidia drivers directly from the runfile, and it was just such a pain because my system either would not boot or would have one screen black. I don’t have Secureboot on so trying to figure out what was wrong was just a pain.

I read something about getting an older cuda version to be able to run the notebooks, but by the time I realized I already switched to Ubuntu 18.10.

Glad to hear it all worked out.

Besides nvidia-smi, I also use gpustat (just a pip install) to check gpu usage, as it quickly shows who’s using how much gpu-mem on my shared box. If you are the only one using your gpu, then nvidia-smi should tell you what you need to know/check.

1 Like

Thank you for you help. I even went ahead and tried one last time to install the drivers on Stretch and it is now up and running.

I’ll be sure to check out gpustat as well.

You can also use NVidia X Server Settings for a visual interface or nvidia-smi on the command line to check GPU usage