Optimization of local setup


(Cynosure) #1

Hi everyone-

After spending a couple of bucks on AWS, i decided to build my own DL Machine rather earlier. I have a desktop machine containing:

  1. ROG-STRIX-Z270E-GAMING motherboard
  2. 32GB RAM
  3. NVIDIA GTX 1070 card
  4. i7-7700K processor
  5. M.2 SSD
  6. Ubuntu 16.04 (dual boot with Windows 10).

I am new to Ubuntu and DL so am trying to optimize the system under Ubuntu. The particular issues i have-

  1. While training the dogs and cats data set, Jupyter Notebook is almost not using the GPU rather using CPU (as it seems to me, the GPU Utilization is at around 5% while CPU is at 100%). Snapshots are below

I think there is somewhere problem as the notebook should be using the GPU more than CPU. The settings in .theanorc are

[global]
floatX = float32
device = gpu

[cuda]
root = /usr/local/cuda
[lib]
cnmem = 1

Anybody has idea where and which setting i need to change to make it use the GPU ?

  1. (not related to ML), the machine has water cooling and 6 casing fans and 3 GPU fans. GPU fans are idle when the GPU is idle but the fans and water pump are on high rpms making a good inconvenient noise even when CPU is idle. I tried lots of things mentioned on internet but somehow i cant seem to control the fans (in windows i can control the fans easily via Asus AI Suite 3).

I tried stuff mentioned here :
install lm-sensors
sensors-detect
sudo pwmconfig – never shows any fan or pump listed rather always says …no PWM modules installed…

Capture3

here it is saying cpu_fan 0 rpm which obviously false as CPU is on full utilization and fans are working. It is taking more than an hour to train the model so far.

any help on this will make my DL experience much better :slightly_smiling_face:

thanks!


(Cynosure) #2

with the default settings (through install-gpu.sh) I have following entries in the .theanorc file

Capture4

but then i get this error while trying import utils :


(Matthew Kleinsmith) #3

The error says to replace “device = gpu” with “device = cuda” in .theanorc. Give that a shot and let us know what happens. You will need to restart your Jupyer notebook kernel and reimport theano.


Note: Theano is no longer being developed (source), and you should switch to another framework eventually. Consider PyTorch or Tensorflow. Depending where you are on your fast.ai learning path, one may be more appropriate than the other.

Here are some PyTorch versions of some fast.ai part 1 notebooks:
https://github.com/recastrodiaz/fastai-course-1-pytorch

Here’s a thread with some more PyTorch learning resources:
http://forums.fast.ai/t/part-1-notebooks-in-pytorch/3114

I don’t mean to overwhelm. If the theano change works, it might be better to go along with it for a while and switch after seeing the deep learning concepts in action.


(Cynosure) #4

hi Matthew,

thanks for the quick reply.
I tried replacing to device=cuda then i get below errors:

when i try to import theano i get this error:

WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
ERROR (theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.7 or higher required)
Traceback (most recent call last):
** File “/home/user/.local/lib/python2.7/site-packages/theano/gpuarray/init.py”, line 21, in **
** import pygpu**
ImportError: No module named pygpu

and on import utilis i get this error:

import utils; reload(utils)

from utils import plots

WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
ERROR (theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.7 or higher required)
Traceback (most recent call last):
** File “/home/user/.local/lib/python2.7/site-packages/theano/gpuarray/init.py”, line 21, in **
** import pygpu**
ImportError: No module named pygpu
Using Theano backend.

i am ran the first lesson of part1 fine on AWS and now trying to run this on newly setup local machine.

i guess i’ll have to switch to PyTorch. I’ll look into that.


(Matthew Kleinsmith) #5

Some ideas:

  • Try installing pygpu via pip install pygpu (or equivalent)
  • Try solutions from this github issue thread: https://github.com/Theano/libgpuarray/issues/514
    • I found the thread by googling “ImportError: No module named pygpu”
  • Consider using PyTorch, as you mentioned.

(Cynosure) #6

Update! (in case it can help somebody else)

I have partially solved my problems. After reading some info on pytorch i decided to stay with theano for now because I thought and somebody suggested that it would have been too much to learn while trying to run the lessons built on theano on pytorch while still learning everything.

Ok. My GPU wasn’t utilized due to any of these reasons:-

  • Theano had older version – .6 something i think
  • pygpu wasnt properly setup. here is how to set it up.

Troubleshooting tips:

  • This page contains details about .theanorc file. As mentioned there, i added a line (in .theanorc : nano .theanorc )force_device=true in this file to force it use gpu
    -This page contains detailed info on making the notebooks use GPU. it also contains a python script which i ran on jupyter notebook to check whether it is running on CPU or GPU. That script exclusively does THAT!

Then i ran into another weird problem, an error saying “Could not initialize pygpu, support disabled” , some info from here helped.

So now I just ran dogs and cats it used GPU 100% and only one Core of CPU was at 100%. Problem solved.

Making the rig a bit silent: I have corsair cooling stuff (H100i v2) for which they have a tool for windows but none for Linux. Somebody came up with some script from somewhere which helped a bit to reduce the pump rps a bit. The both fan and pump are at above 1500 rpm even at being idle. if somebody has any further tip, would be much appreciated as it would help reducing strain on my ears while training models :slight_smile: