Personal DL box

Need some advise, guys.
Is GTX 1080Ti worth additional $200 over GTX1080? I know it has +3GB memory so it should be quicker as potentially it can process more pictures at once. But when I was training models on p2.xlarge I have never seen memory load more than 60% (it is K80 if I am not mistaken, so 60% is 7GB). Limitation probably was on a hard drive IO. Patriot HellFire MLC can do 3000/2400MB/s but I I cant sync all this math to understand if I can load all 11GB of 1080Ti?

And a simpler question: should 8 GB RAM be enough? Or maybe somebody can give feedback on config?

1 Like

Aws is a great place to start so I think itā€™s important to expand this a bit. Using an online platform is great to test the waters and decide if you want to explore deep learning further. In the long run, it makes more sense to buy a personal machine to change the cost model from a pay per hour cost to a pay once and explore all you want.

1 Like

Yes. Get the bigger gpu. I have the 1080Ti and it is really a great one. I guess I canā€™t speak for the 1080 as Iā€™ve never had it, but I donā€™t regret spending the extra money in this area at all.

Just to also point out, if you already had a gtx1080 I probably wouldnā€™t recommend upgrading to the 1080Ti, but when you are buying one or the other, go for the better hardware.

5 Likes

If I have a personal DL box, how can I replicate the whole AWS fastai environment/library on Linux?

I tried to set up a virtual environment for Py36 and started installing few dependencies. But, I experienced lots of error messages about missing other dependencies. Should I ignore the error messages and keep installing all the dependencies (see the full list below)? Any short cut for this process? For future maintenance, am I required to run ā€œgit pullā€ and ā€œconda env updateā€ on a regularly basis?

My laptop has GTX1070, 32GB RAM, 2 x 1TB SSD (dual OS - Windows and Linux).

1 Like

Once you have conda installed, it is really cool. You can use:

conda env create -f environment.yml

https://conda.io/docs/user-guide/tasks/manage-environments.html

This downloads all of the dependencies and then all you have to do is:

source activate fastai

Then to maintain it, you will have to do the git pull and conda env update.

I would recommend putting something in your .bashrc file that does the source activate unless you like to have more flexibility than that.

2 Likes

I setup an Ubuntu box a couple weeks ago. I didnā€™t use conda, but I do have everything in a venv. I havenā€™t bothered updating any python modules after installing them the first time and havenā€™t run into any problems.

What kind of error messages are you getting? Python modules should install their dependencies automatically.

Yup, this is our recommended approach. Let us know if you get stuck!

1 Like

I would definitely get the 1080Ti if I were you, I was being cheap and got the 1070 and regret it now :slight_smile:

Also I would get 32gb RAM if you want to be able to handle larger data sets with bigger batch sizes. I have 16gb and wish I had more. I def think 8gb is too little so at least get 16gb.

3 Likes

Just to clarify - more RAM in your PC doesnā€™t help with bigger batch sizes. Only more RAM in your GPU, which you canā€™t increase unfortunately.

3 Likes

More RAM would allow you to handle more data in preprocessing correct?

Not really - preprocessing a batch takes very little RAM. Shouldnā€™t ever be a bottleneck.

1 Like

i have GeForce GTX 1080 - 8GB, and recent NN architectures brought up the ā€œout of memoryā€ issue to deal with

1 Like

Haha. I got stuck as expected. :sweat_smile: Thanks for @johnnyv as my remote IT support. Here are the procedure to replicate fastai environment to a local machine.

  1. In terminal, under ~/anaconda3/envs/ directory,
    $ git clone https://github.com/fastai/fastai.git

  2. $ cd fastai/ (the environment.yml file is under this directory)

  3. In ~/anaconda3/envs/fastai/ directory,
    $ conda env create -f environment.yml

1 Like

What architecture are you running ? I am just curious to know ā€¦ I have been playing around with ResNets and VGG for a while havenā€™t faced any such issues.

Regards,

Gokkul

@jeremy When I started running the notebooks, I came crossed ā€œcannot find module bcolzā€. So, I installed bcolz individually. However, I got an error message saying bcolz was installed. I removed everything and reinstalled the whole repo by using download zip file from GitHub instead of git clone. However, no improvement. I found others discussing similar problem in Paperspace few days ago. But, no solution in the forums. Any idea? In the meantime, I will use AWS.

Hi Sarada,

BTW, itā€™s not necessary to run the git clone command from the ~/anaconda3/envs directory as you describe in step 1. You can run that from anywhere on your hard drive!

I prefer to keep all the source code that I download in one place, under ~/source, so I would have gone cd ~/source in step 1 instead.

Iā€™ve found that sometimes if I canā€™t install it using pip install bcolz that I try conda install bcolz instead, and if the package is available with conda it usually works after that. Have you tried using conda to install bcolz?

1 Like

sure, i had no problem with VGG-like but my input images were gray-scale and relatively small. But when i tried to switch to Xception (Keras version) which is pretty deep i needed to decrease the batch size substantially - same with Resnet, a smaller batch helped.
but currently iā€™m running a CycleGAN, and OOM error causes a more serious problem - since the batch size is 1, i needed to decrease an training images size - practically half it in order to make it work - the generated images look not too bad but maybe the result would be better if i were running the full sizeā€¦

Thanks for posting that! One more step:

source activate fastai

You need to do that step every time you login. Or else put it in your .bashrc

2 Likes

hi All,

For those of you who have been building a new box. I just finished installing nvidia cuda and cudnn drivers last night. Had some issues getting the drivers my Ubuntu box, but found a very useful tutorial is actually found on the opencv website. If someone is just setting up their desktop build for the first time, hope these notes are helpful.

If anyone else has any feedback or installation notes, or a better guide, I would be interested in their experiences as well. I know some of my fellow USF masterā€™s students have been re-configuring old computers to use as DL boxes.

Cheers,

Tim

Installation Guide covers:

  • installation of Nvidia drivers on Ubuntu, specifically CUDA, CUDNN drivers
  • setup of python environments for deep learning frameworks (ignore if you want to use conda for package installations)

A couple of caveats:

  1. Know your Framework / Driver Version Compatibility: Before you start installing any of the software, note compatibility issues with Torch. From the website the only links available are for CUDA 7.5 or 8.0, which are older versions. To make Torch run on CUDA 9, you have to clone a repo + install (a bit more complicated)
  2. Restart your comp after drivers are installed: Once the CUDA is installed, make sure to reboot your machine to make sure the drivers are installed.
  3. Check versions between CUDA + CUDNN Make sure the CUDNN + CUDA versions are matched correctly with the framework you want to use
  4. Note Python version + Framework Compatibility: If ever interested in tensorflow, make sure your python version matches (sometimes TF is looking for 3.5 instead of the current 3.6)
  5. Recommend the .deb installation method : Thereā€™s two ways of installing the nvidia CUDA drivers, the .deb / local run file option. IMO, the .deb(local) approach is much cleaner and easier to manage. (see img below)

Installing Deep Learning Frameworks on Ubuntu with Cuda Support

https://www.learnopencv.com/installing-deep-learning-frameworks-on-ubuntu-with-cuda-support/

My Rig:
Intel i5 (from 2011)
32 GB RAM
Nvidia GTX1080
1 x 500GB SSD ubuntu
Some other HDā€™s for storage and a Windows Boot

6 Likes