Personal DL box

sermakarevich · November 21, 2017, 8:36pm

Need some advise, guys.
Is GTX 1080Ti worth additional $200 over GTX1080? I know it has +3GB memory so it should be quicker as potentially it can process more pictures at once. But when I was training models on p2.xlarge I have never seen memory load more than 60% (it is K80 if I am not mistaken, so 60% is 7GB). Limitation probably was on a hard drive IO. Patriot HellFire MLC can do 3000/2400MB/s but I I cant sync all this math to understand if I can load all 11GB of 1080Ti?

And a simpler question: should 8 GB RAM be enough? Or maybe somebody can give feedback on config?

KevinB · November 21, 2017, 9:25pm

Aws is a great place to start so I think it’s important to expand this a bit. Using an online platform is great to test the waters and decide if you want to explore deep learning further. In the long run, it makes more sense to buy a personal machine to change the cost model from a pay per hour cost to a pay once and explore all you want.

KevinB · November 21, 2017, 9:27pm

Yes. Get the bigger gpu. I have the 1080Ti and it is really a great one. I guess I can’t speak for the 1080 as I’ve never had it, but I don’t regret spending the extra money in this area at all.

Just to also point out, if you already had a gtx1080 I probably wouldn’t recommend upgrading to the 1080Ti, but when you are buying one or the other, go for the better hardware.

Moody · November 21, 2017, 10:45pm

If I have a personal DL box, how can I replicate the whole AWS fastai environment/library on Linux?

I tried to set up a virtual environment for Py36 and started installing few dependencies. But, I experienced lots of error messages about missing other dependencies. Should I ignore the error messages and keep installing all the dependencies (see the full list below)? Any short cut for this process? For future maintenance, am I required to run “git pull” and “conda env update” on a regularly basis?

My laptop has GTX1070, 32GB RAM, 2 x 1TB SSD (dual OS - Windows and Linux).

github.com

fastai/fastai/blob/master/environment.yml

name: fastai
channels:
- fastai
- pytorch
- defaults
- peterjc123
dependencies:
- scipy
- cuda90
- cudnn
- numpy
- pillow
- jpeg
- spacy
- zlib
- freetype
- libtiff
- bleach
- certifi
- cffi

This file has been truncated. show original

KevinB · November 21, 2017, 11:47pm

Once you have conda installed, it is really cool. You can use:

conda env create -f environment.yml

https://conda.io/docs/user-guide/tasks/manage-environments.html

This downloads all of the dependencies and then all you have to do is:

source activate fastai

Then to maintain it, you will have to do the git pull and conda env update.

I would recommend putting something in your .bashrc file that does the source activate unless you like to have more flexibility than that.

aloisius · November 21, 2017, 11:47pm

I setup an Ubuntu box a couple weeks ago. I didn’t use conda, but I do have everything in a venv. I haven’t bothered updating any python modules after installing them the first time and haven’t run into any problems.

What kind of error messages are you getting? Python modules should install their dependencies automatically.

jeremy · November 22, 2017, 1:04am

Yup, this is our recommended approach. Let us know if you get stuck!

jamesrequa · November 22, 2017, 1:27am

I would definitely get the 1080Ti if I were you, I was being cheap and got the 1070 and regret it now

Also I would get 32gb RAM if you want to be able to handle larger data sets with bigger batch sizes. I have 16gb and wish I had more. I def think 8gb is too little so at least get 16gb.

jeremy · November 22, 2017, 2:16am

Just to clarify - more RAM in your PC doesn’t help with bigger batch sizes. Only more RAM in your GPU, which you can’t increase unfortunately.

KevinB · November 22, 2017, 2:18am

More RAM would allow you to handle more data in preprocessing correct?

jeremy · November 22, 2017, 2:23am

Not really - preprocessing a batch takes very little RAM. Shouldn’t ever be a bottleneck.

helena · November 22, 2017, 3:32am

i have GeForce GTX 1080 - 8GB, and recent NN architectures brought up the “out of memory” issue to deal with

Moody · November 22, 2017, 5:26am

Haha. I got stuck as expected. Thanks for @johnnyv as my remote IT support. Here are the procedure to replicate fastai environment to a local machine.

In terminal, under ~/anaconda3/envs/ directory,
$ git clone https://github.com/fastai/fastai.git
$ cd fastai/ (the environment.yml file is under this directory)
In ~/anaconda3/envs/fastai/ directory,
$ conda env create -f environment.yml

gokkulnath · November 22, 2017, 5:54am

What architecture are you running ? I am just curious to know … I have been playing around with ResNets and VGG for a while haven’t faced any such issues.

Regards,

Gokkul

Moody · November 22, 2017, 6:51am

@jeremy When I started running the notebooks, I came crossed “cannot find module bcolz”. So, I installed bcolz individually. However, I got an error message saying bcolz was installed. I removed everything and reinstalled the whole repo by using download zip file from GitHub instead of git clone. However, no improvement. I found others discussing similar problem in Paperspace few days ago. But, no solution in the forums. Any idea? In the meantime, I will use AWS.

johnnyv · November 22, 2017, 7:02am

Hi Sarada,

BTW, it’s not necessary to run the git clone command from the ~/anaconda3/envs directory as you describe in step 1. You can run that from anywhere on your hard drive!

I prefer to keep all the source code that I download in one place, under ~/source, so I would have gone cd ~/source in step 1 instead.

johnnyv · November 22, 2017, 7:03am

I’ve found that sometimes if I can’t install it using pip install bcolz that I try conda install bcolz instead, and if the package is available with conda it usually works after that. Have you tried using conda to install bcolz?

helena · November 22, 2017, 1:07pm

sure, i had no problem with VGG-like but my input images were gray-scale and relatively small. But when i tried to switch to Xception (Keras version) which is pretty deep i needed to decrease the batch size substantially - same with Resnet, a smaller batch helped.
but currently i’m running a CycleGAN, and OOM error causes a more serious problem - since the batch size is 1, i needed to decrease an training images size - practically half it in order to make it work - the generated images look not too bad but maybe the result would be better if i were running the full size…

jeremy · November 22, 2017, 5:50pm

Thanks for posting that! One more step:

source activate fastai

You need to do that step every time you login. Or else put it in your .bashrc

timlee · November 22, 2017, 7:49pm

hi All,

For those of you who have been building a new box. I just finished installing nvidia cuda and cudnn drivers last night. Had some issues getting the drivers my Ubuntu box, but found a very useful tutorial is actually found on the opencv website. If someone is just setting up their desktop build for the first time, hope these notes are helpful.

If anyone else has any feedback or installation notes, or a better guide, I would be interested in their experiences as well. I know some of my fellow USF master’s students have been re-configuring old computers to use as DL boxes.

Cheers,

Tim

Installation Guide covers:

installation of Nvidia drivers on Ubuntu, specifically CUDA, CUDNN drivers
setup of python environments for deep learning frameworks (ignore if you want to use conda for package installations)

A couple of caveats:

Know your Framework / Driver Version Compatibility: Before you start installing any of the software, note compatibility issues with Torch. From the website the only links available are for CUDA 7.5 or 8.0, which are older versions. To make Torch run on CUDA 9, you have to clone a repo + install (a bit more complicated)
Restart your comp after drivers are installed: Once the CUDA is installed, make sure to reboot your machine to make sure the drivers are installed.
Check versions between CUDA + CUDNN Make sure the CUDNN + CUDA versions are matched correctly with the framework you want to use
Note Python version + Framework Compatibility: If ever interested in tensorflow, make sure your python version matches (sometimes TF is looking for 3.5 instead of the current 3.6)
Recommend the .deb installation method : There’s two ways of installing the nvidia CUDA drivers, the .deb / local run file option. IMO, the .deb(local) approach is much cleaner and easier to manage. (see img below)

Installing Deep Learning Frameworks on Ubuntu with Cuda Support

https://www.learnopencv.com/installing-deep-learning-frameworks-on-ubuntu-with-cuda-support/

My Rig:
Intel i5 (from 2011)
32 GB RAM
Nvidia GTX1080
1 x 500GB SSD ubuntu
Some other HD’s for storage and a Windows Boot