Personal DL box

miguel_perez · November 30, 2017, 10:35am

@radek, probably you already know but just in case, if I see GPU utilization is low and GPU memory is under say 50% I will increase batch size and usually means much more optimized training.

About CPU, I agree that even if not the number one bottleneck in DL it will pay to have a good enough one, I had for some months this year a dedicated server with Ryzen(8cores, 16 threads) and really happy about its performance, good benchmarkings as far as I know.

And, lastly, I think CPU RAM to be the bottleneck many times because data wrangling requires a lot (and inversely correlated with your coding efficiency). I have 32 GB laptop locally, minimum 64GB on rented servers… and I always find myself in situations where I will need more. For a personal DL box, that I dont have, I wouldn’t have less than 128GB if possible,

Kind of beefy specs but, well Christmas is near already!

KevinB · November 30, 2017, 6:38pm

The nice thing about RAM is you can upgrade it after the fact. So as long as you are mindful of wanting 128GB at some point, you can build a system with that in mind and actually just put like 32GB in with the intention to add the rest later. Just make sure not to use like 8X4GB sticks if you want to bump it up later.

Moody · November 30, 2017, 8:16pm

I remove all things and re-install it again and again. Once, it was working, I stopped.

sermakarevich · November 30, 2017, 8:19pm

Thats, obviously, an explanation of a day !!!

jackalack · December 1, 2017, 4:35pm

@beacrett What is your full DL box setup?

beacrett · December 1, 2017, 5:24pm

Currently:

Ryzen 5 1600 (6 core, can over clock if desired, comes with cooler)
AM4 B350 chipset motherboard (B350 is the middle tier of their chipsets - it is worth it for the small price bump)
16GB DDR4 3200 RAM (fastest supported by my processor, going to get another 16GB)
250GB NVMe M.2 SSD
2TB HDD
1080 ti (EVGA GeForce GTX 1080 Ti SC Black Edition - very happy with this so far - great cooling)
750 watt modular power supply (would need to get a bigger one if adding a second GPU)
Dual boot Windows 10 / Ubuntu 16.04 LTS

imho, its worth getting the fastest ram supported by your CPU (within cost reason). Keep track of the model and its timings - you may need to manually change settings in the bios to ensure it is running at full speed and you want any new ram you buy to match the speed and timings for optimal performance (try to order the same model to keep it simple)

sermakarevich · December 1, 2017, 7:13pm

There is no turning back Delivered today, “feeling like a little kid”

sermakarevich · December 1, 2017, 11:02pm

Has anybody tried to activate conda virtualenv and run jupyter notebook from within crontab job? source activate does not work for me, source bin/activate throws me to root user and does not activate anything. Nothing useful in google forest so far.

rob · December 2, 2017, 1:12am

I haven’t, but maybe you can try running a script as a login shell,

#!/bin/bash -l
cd fastai
source activate fastai 2>/dev/null &
nohup jupyter-notebook 2>/dev/null &

That has usually solved my “this isn’t working in cron” woes in the past

sermakarevich · December 2, 2017, 4:14pm

Thanks @rob, this is what worked for me link

UPDT: no it did not work as well. Cant activate conda env from cron.

manikanta_s · December 15, 2017, 7:55am

I tried setting up Deep Learning Machine on Azure for fast.ai and it’s working fine.

During the setup, I faced an issue which seems to be an issue with Jupyter.

The issue is while following the steps in readme, even after creating the environment and activating the fastai environment I am not able to find the actual kernel for fastai.

I have browsed and many people faced similar issues with jupyter and conda .

I have resolved by manually installing kernel after activating fast ai environment.

python -m ipykernel install --user --name fastai --display-name “Python (fastai)”

Please let me know if anyone else faced the same issue.

lymitshn · January 4, 2018, 9:06pm

Hello, I have used paperspace script with a fresh install ubuntu 16.04 with a 1070 everything was ok untill I tried to run learning cell “resnet34”. I monitored my system and it uses all the RAM until the point where kernel shutdowns itself (8G RAM) and 1G Swap.

But it doesnt use VRAM at all I checked it with nvidia-smi tool. I’m guessing it doesnt use GPU at all? But still shouldnt 8G ram should be enough? I also added the kernel with the command posted and switched kernel to that stil no luck.

So what can be the problem?

ecdrid · January 4, 2018, 9:11pm

Adding the notebook and some screenshots before and after might help other forum members to respond otherwise it’s like throwing an arrow in dark…
Thanks …

lymitshn · January 4, 2018, 9:19pm

It is the Lesson1 notebook.

jeremy · January 4, 2018, 10:37pm

@lymitshn I wouldn’t suggest using your own box for learning this course - better to use the fast.ai AMI on AWS, Paperspace, or Crestle. Once you’re comfortable with the basic techniques then you can come back to getting your own machine working. You’ll know enough at that point to understand how to debug your issues and ask for help in a way that we can be useful to you.

lymitshn · January 6, 2018, 5:34pm

Thank you for suggestion but I really want to run my local machine.
I created a new env and verified and torch uses GPU also added 6G swap and it seemed to work this time. Ran until hitting %12 (slowly…) but it was using only 800 MB VRAM and %16 GPU Power at peak and after consumed all DRAM and swap, kernel restarted itself.
It can clearly access GPU but still tries to use high DRAM is this how the model supposed to work? Or is something wrong with my setup?

gdc · January 11, 2018, 8:47pm

Hi! I ran into the same issue. solution was just given on Wiki: Lesson 1. It was not a setup issue (at least in my case), but reducing the number of workers was necessary for loading/transforming the data as this part is done on CPU/main RAM:
data = ImageClassifierData.from_paths(......, num_workers=1)

beecoder · February 17, 2018, 4:11pm

Regarding a personal DL box, I’m seeing some Presidents day deals coming with a Windows and dual-gpu setup (2 cards of 1050ti or 1070 ). The big price increase of 1080ti since last Nov hasn’t helped.

I’m weighing the benefit of having 2 cards to run to experiments versus a faster 1080ti. The main use-case is being ready for part-2 of this course and for my own learning.

Does anyone have such a setup/ know about the pros and cons ?
I’m wondering if I will have to do a dual-OS install for the machine since the box comes with Windows and if I’ll be able to access both cards fully.

FourMoBro · February 17, 2018, 4:19pm

i have 2x 1080ti, in a dual boot, separate harddrives setup.

I would recommend a dual boot setup only if you can install the OS separately so there is no interference. I have posted several links regarding this.

as far as multiple gpu cards, if you can afford it, great, but the fastai library will not use both cards while training at this time. However, there are other DL frameworks which can use all available gpus with no real setup. I don’t experiment while another is training so for fastai duty 1 card is displaying, the other is training.

beecoder · February 17, 2018, 4:41pm

Thanks! I did look at your dual boot link. Right, I’ve seen people trying to use both cards for training without much success. I’m more interested in the interleaved approach i.e training on one card and using the other card for some interactive/lightweight work.

I saw the thread on Fastai installation on Windows, not sure how mature this is. If I am able to get this to work, a single OS would suffice. Still a noob in this respect, I have questions whether using 1 card has to be used for display purposes, and if windows gives full access to the 2 gpus etc.

While I’ve got you here, did NVME drives help with performance significantly? And how what RAM did you use, wondering if 32GB is a must.