Personal DL box

(sergii makarevych) #24

Need some advise, guys.
Is GTX 1080Ti worth additional $200 over GTX1080? I know it has +3GB memory so it should be quicker as potentially it can process more pictures at once. But when I was training models on p2.xlarge I have never seen memory load more than 60% (it is K80 if I am not mistaken, so 60% is 7GB). Limitation probably was on a hard drive IO. Patriot HellFire MLC can do 3000/2400MB/s but I I cant sync all this math to understand if I can load all 11GB of 1080Ti?

And a simpler question: should 8 GB RAM be enough? Or maybe somebody can give feedback on config?

(Kevin Bird) #25

Aws is a great place to start so I think it’s important to expand this a bit. Using an online platform is great to test the waters and decide if you want to explore deep learning further. In the long run, it makes more sense to buy a personal machine to change the cost model from a pay per hour cost to a pay once and explore all you want.

(Kevin Bird) #26

Yes. Get the bigger gpu. I have the 1080Ti and it is really a great one. I guess I can’t speak for the 1080 as I’ve never had it, but I don’t regret spending the extra money in this area at all.

Just to also point out, if you already had a gtx1080 I probably wouldn’t recommend upgrading to the 1080Ti, but when you are buying one or the other, go for the better hardware.

(Sarada Lee) #27

If I have a personal DL box, how can I replicate the whole AWS fastai environment/library on Linux?

I tried to set up a virtual environment for Py36 and started installing few dependencies. But, I experienced lots of error messages about missing other dependencies. Should I ignore the error messages and keep installing all the dependencies (see the full list below)? Any short cut for this process? For future maintenance, am I required to run “git pull” and “conda env update” on a regularly basis?

My laptop has GTX1070, 32GB RAM, 2 x 1TB SSD (dual OS - Windows and Linux).

(Kevin Bird) #28

Once you have conda installed, it is really cool. You can use:

conda env create -f environment.yml

This downloads all of the dependencies and then all you have to do is:

source activate fastai

Then to maintain it, you will have to do the git pull and conda env update.

I would recommend putting something in your .bashrc file that does the source activate unless you like to have more flexibility than that.

(Jordan) #29

I setup an Ubuntu box a couple weeks ago. I didn’t use conda, but I do have everything in a venv. I haven’t bothered updating any python modules after installing them the first time and haven’t run into any problems.

What kind of error messages are you getting? Python modules should install their dependencies automatically.

(Jeremy Howard) #30

Yup, this is our recommended approach. Let us know if you get stuck!

(James Requa) #31

I would definitely get the 1080Ti if I were you, I was being cheap and got the 1070 and regret it now :slight_smile:

Also I would get 32gb RAM if you want to be able to handle larger data sets with bigger batch sizes. I have 16gb and wish I had more. I def think 8gb is too little so at least get 16gb.

(Jeremy Howard) #32

Just to clarify - more RAM in your PC doesn’t help with bigger batch sizes. Only more RAM in your GPU, which you can’t increase unfortunately.

(Kevin Bird) #33

More RAM would allow you to handle more data in preprocessing correct?

(Jeremy Howard) #34

Not really - preprocessing a batch takes very little RAM. Shouldn’t ever be a bottleneck.

(helena s) #35

i have GeForce GTX 1080 - 8GB, and recent NN architectures brought up the “out of memory” issue to deal with

(Sarada Lee) #36

Haha. I got stuck as expected. :sweat_smile: Thanks for @johnnyv as my remote IT support. Here are the procedure to replicate fastai environment to a local machine.

  1. In terminal, under ~/anaconda3/envs/ directory,
    $ git clone

  2. $ cd fastai/ (the environment.yml file is under this directory)

  3. In ~/anaconda3/envs/fastai/ directory,
    $ conda env create -f environment.yml

(Gokkul Nath T S) #37

What architecture are you running ? I am just curious to know … I have been playing around with ResNets and VGG for a while haven’t faced any such issues.



(Sarada Lee) #38

@jeremy When I started running the notebooks, I came crossed “cannot find module bcolz”. So, I installed bcolz individually. However, I got an error message saying bcolz was installed. I removed everything and reinstalled the whole repo by using download zip file from GitHub instead of git clone. However, no improvement. I found others discussing similar problem in Paperspace few days ago. But, no solution in the forums. Any idea? In the meantime, I will use AWS.

Windows setup with CUDA for real
(john v) #39

Hi Sarada,

BTW, it’s not necessary to run the git clone command from the ~/anaconda3/envs directory as you describe in step 1. You can run that from anywhere on your hard drive!

I prefer to keep all the source code that I download in one place, under ~/source, so I would have gone cd ~/source in step 1 instead.

(john v) #40

I’ve found that sometimes if I can’t install it using pip install bcolz that I try conda install bcolz instead, and if the package is available with conda it usually works after that. Have you tried using conda to install bcolz?

(helena s) #41

sure, i had no problem with VGG-like but my input images were gray-scale and relatively small. But when i tried to switch to Xception (Keras version) which is pretty deep i needed to decrease the batch size substantially - same with Resnet, a smaller batch helped.
but currently i’m running a CycleGAN, and OOM error causes a more serious problem - since the batch size is 1, i needed to decrease an training images size - practically half it in order to make it work - the generated images look not too bad but maybe the result would be better if i were running the full size…

(Jeremy Howard) #42

Thanks for posting that! One more step:

source activate fastai

You need to do that step every time you login. Or else put it in your .bashrc

(Tim Lee) #43

hi All,

For those of you who have been building a new box. I just finished installing nvidia cuda and cudnn drivers last night. Had some issues getting the drivers my Ubuntu box, but found a very useful tutorial is actually found on the opencv website. If someone is just setting up their desktop build for the first time, hope these notes are helpful.

If anyone else has any feedback or installation notes, or a better guide, I would be interested in their experiences as well. I know some of my fellow USF master’s students have been re-configuring old computers to use as DL boxes.



Installation Guide covers:

  • installation of Nvidia drivers on Ubuntu, specifically CUDA, CUDNN drivers
  • setup of python environments for deep learning frameworks (ignore if you want to use conda for package installations)

A couple of caveats:

  1. Know your Framework / Driver Version Compatibility: Before you start installing any of the software, note compatibility issues with Torch. From the website the only links available are for CUDA 7.5 or 8.0, which are older versions. To make Torch run on CUDA 9, you have to clone a repo + install (a bit more complicated)
  2. Restart your comp after drivers are installed: Once the CUDA is installed, make sure to reboot your machine to make sure the drivers are installed.
  3. Check versions between CUDA + CUDNN Make sure the CUDNN + CUDA versions are matched correctly with the framework you want to use
  4. Note Python version + Framework Compatibility: If ever interested in tensorflow, make sure your python version matches (sometimes TF is looking for 3.5 instead of the current 3.6)
  5. Recommend the .deb installation method : There’s two ways of installing the nvidia CUDA drivers, the .deb / local run file option. IMO, the .deb(local) approach is much cleaner and easier to manage. (see img below)

Installing Deep Learning Frameworks on Ubuntu with Cuda Support

My Rig:
Intel i5 (from 2011)
Nvidia GTX1080
1 x 500GB SSD ubuntu
Some other HD’s for storage and a Windows Boot