MAC with GPU?

Thanks Mariam. Yes, I tried with your above link on Mxnet , it worked for my GPU.

But still lesson1.ipynb is not using my GPU.
0%| | 1/363 [00:32<3:16:07, 32.51s/it]

So I am going to build PyTorch from source, let’s see how it goes …

Hi @Mariam,
For some reason building PyTorch from source is not working for me(It crashes my machine when I try to use for full dataset in fastai lessons, but works for sample). Can you please explain me the steps you took to install through “pip” ?

Hi @shankarrajus,

pip install fastai

will install pytorch for you. however, pytorch will not be based on cuda.

I ended up installing ubuntu 16.04 along with mac OSx (dual boot, not through virtualbox). I found the process to be relatively straight forward and now pytorch is using GPU for computations.

Thanks @Mariam

I reduced the sz to 100(in lesson1.ipynb), then it worked . But it looks like some out-of-memory issue is happening as per the comments from Pytorch forum https://github.com/pytorch/pytorch/issues/4926

Dual boot on Ubuntu, it never occured to me. I think I am going to explore that option. If you have any link on how to do dual boot installation, please pass it on …

@shankarrajus

After creating a bootable usb with ubuntu I followed this link

Ubuntu was installed but my wireless card was not working. To fix this I followed user2649966 answer

Next, I installed anaconda.

Finally, follow these instructions to install cuda.

1 Like

“sz” refers to the image size and you do not want to change it for this model at this time. if you need to change the batch size from its default value of 64, you can pass in the bs= parameter like so…

data = ImageClassifierData.from_paths(PATH, bs=32, tfms=tfms_from_model(arch, sz))

Thanks @FourMoBro , I am yet to explore this reducing batch size but I believe it will increase my execution time may folds :frowning:

Thanks @Mariam
Atlast, I successfully setup Ubuntu on my Mac and ran the first lesson with few hiccups. These are the steps which I followed, it might help others:

  1. I used this link for https://www.lifewire.com/dual-boot-linux-and-mac-os-4125733 for the dual boot, it is very detailed.
  2. In Ubuntu, search for “Additional Drivers” , then select propritary drivers for GPU and Wireless card.
  3. Then I just used fastai/conda env update to install all the CUDA drivers and other libraries

Though I am still getting “out of memory” issue while running lesson1.ipynb, I am doing the temp fix:

  1. Thanks to jeremy , I now check the following initially to make sure everything is set:
    torch.cuda.is_available()
    torch.backends.cudnn.enabled
    torch.cuda.current_device()
    torch.cuda.device(0)
    torch.cuda.get_device_name(0) #This should give us GPU name

  2. Then I go upto “learn.save(‘224_lastlayer’)” , then I restart my kernel.

  3. Run torch.cuda.empty_cache() after every training, this helps me alot in giving some free space in GPU (but still ~900MB is used always , don’t know how to clear it)

  4. Then I load the saved model and proceed from there. I noticed if my GPU free memory is atleast 1GB it runs fine (Mine is GT 750M having only 2GB).

Only now, I am going to learn on first lesson :slight_smile:

1 Like

I’ve spent quite some time learning about the new hardware it is fascinating and I can share some superb blog posts about that. However, after weeks of exploration, I’ve ended up purchasing and assembling a PC based on pugetsystems - digits with small modifications. I would probably by from pugetsystems if they were selling to EU.

The PC has a motherboard with an older X99 socket. However, this is the only Intel-based option that lets you put 4 GPU with 16x lanes on each.

Here is my hardware selection

  • I have 4 GPU’s, but I can’t put them on the pcpartpicker, as the cards don’t support 4-way SLI.
  • I would change the chaises to BE QUIET! DARK BASE 900 - it has better quality and looks.
  • Make sure you buy the MSI Areo OC card or Founders Edition this are the only cards that have Air Intakes on the rear I could find. Initially, I’ve purchased 4x Asus 1080 TI Turbo, because that are blower cards however intake is only on the side so they become super hot when used intensively.

If you are happy with 2 or 3 cards consider going for Ryzen or ThreadRipper. The CPU isn’t that important as GPU. However this is only true if you can write optimised code, and when you are training your models this is usually the least of your concerns then having a fast CPU and good SSD helps.

1 Like

You may be mistaken or the information is not quite accurate. While the x99 boards may fit 4 gpu cards, they do not all run at 16 lanes each when all are in use. The processor dictates how many lanes will be used. As of now I have yet to see an Intel processor with more than 44 lanes, and my 6850K only has 40 lanes. If you are running 4 cards on X99, you will probably be running them at 8x each for 32 lanes, not the implied 64.

My motherboard has a switch for pci-e lines and as far as I understood each card can use 16x lanes if their are available and other cards aren’t using them.
According to asus specs:
7 x PCIe 3.0/2.0 x16 (single x16 or dual x16/x16 or triple x16/x16/x16 or quad x16/x16/x16/x16 or seven x16/x8/x8/x8/x8/x8/x8) (https://www.asus.com/Motherboards/X99-E-10G-WS/specifications/)

Have a look at the block diagram that shows how PLX 8747 and QSW 1480 are used together to multiplex 32 lanes to 64

1 Like

you still only have 32 lanes to the processor at any given time between those units. even if everything was 4x16, and if they are maxed with info, there will be a waiting game. now this is probably all a moot point as i have yet to see any studies (they could be out there) where the type of DL we do here fills up 16 lanes of info for just one card let alone 4. If we can get to that point of hardware utilization, I would look into the Threadripper as it can support more lanes than the Intel units at this time.

1 Like

Indeed if all cards are actively using the 32 lanes you are out of luck, fortunately that isn’t happening with deeplearning usually a forward + backward pass on a batch takes much longer than transfer time. You are right that writing algorithms that utilise more than one GPU is hard. But I usually run 4 experiments at once, it greatly speeds up hyper parameter tuning, and I’d like to be able to run them on full speed hence the choice of X99-E-10G.
Moreover I’ve seen a study claims that there is no difference between 16x or 8x for deeplearning: https://www.pugetsystems.com/labs/hpc/PCIe-X16-vs-X8-for-GPUs-when-running-cuDNN-and-Caffe-887/

re. Threadripper I was considering buying it and I think it would work really really well but I wasn’t able to find a motherboard that would give me 4 GPUs and 10g ethernet on the same time, but I bet they will develop such motherboards in the future. If you find such motherboard please share.

The reason for 10g and 4 GPU I intend to stack such computers together in the future once my company takes off.

1 Like

Is it possible to use an Intel Graphics Card instead of a Nvidia GPU?

Today is June 21, 2018,

I’d like to have the environment installed in my MacOSX, which has the NVIDIA GeForce GT 750M, which has 384 CUDA Cores… according to this: https://www.anandtech.com/show/6873/nvidias-geforce-700m-family-full-details-and-specs/2
40%20PM

Then when executed the command: conda env up, the error shown was

ResolvePackageNotFound:

  • cuda90

Did a google search found this page seems like FastAI does not support MacOSX, according to the @jeremy https://github.com/fastai/fastai/issues/84

Honestly is quiet frustrating not be able to run this locally having the hardware capable… I haven’t give up yet… keep search on CUDA for MacOSX and found this
https://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html
The guide shows how to install the CUDA for MacOSX for C/C++ development after all the hardware is CUDA capable

15%20PM

Here you download the CUDA toolkit for MacOSX
https://developer.nvidia.com/cuda-downloads

Installation screen of CUDA Toolkit on MacOSX
16%20PM

It is possible to make use of the CUDA Cores on MacOSX at least from C/C++, I think that what we are missing is the Python library that leverages the NVIDIA CUDA Toolkit libraries and tools for MacOSX version

My wonder right now is… Why hasn’t been release this Python Cuda90 library for MacOSX?

I keep searching on google: cuda python development on mac and found this:

NUMBA Project
http://numba.pydata.org/

Through this approach using http://numba.pydata.org/, Numba from their mainpage: “Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool). Numba supports compilation of Python to run on either CPU or GPU hardware, and is designed to integrate with the Python scientific software stack.

Then I think the first comment by @piotr.czapla is no longer valid, MacOSX with capable GPU Cuda capable exists today… and is possible to have Python to rely on Cuda using Numba project… although is another story for how to integrate this into FastAI infrastructure any time soon… but the idea I’m trying to transmit is that this bridge is feasible… but not sure how to approach… maybe @jeremy can give more advise up to date with this?
Thank you.

UPDATE:
I haven’t get an answer and haven’t give up.

I have managed to compile pytorch for my computer in the hope of get CUDA support. I followed all the instructions for MacOSX detailed here: https://github.com/pytorch/pytorch to compile from the cloned git repo, and I was able to compile, without error. the compilation took around 45 minutes on my Machine specs:

10%20PM

If you manage to go to this page:
http://www.nvidia.com/Download/index.aspx?lang=en-us to Download the Drivers for the Video Card… it seems like they are unavailable for MacOSX , ahh!!!

Then go to the NVIDIA CUDA Driver for MAC Link shown…
http://www.nvidia.com/object/mac-driver-archive.html
Then click on the last version

After install on my machine I’ve got this “nice” update required splash screen, which is shown every time you reboot:

31%20PM

The driver for the CUDA is installed but the driver for the video GPU is outdated
Then I keep searching on google following term: geforce gt 750m driver mac, found this link:

My MacOSX version 10.13.5 this link is the direct download page, which is this
http://www.nvidia.com/download/driverResults.aspx/134834/en-us

Released this month after install a reboot is required… and the update required is no longer shown

03%20PM

Also gets the menu

17%20PM

When executing this line:
torch.cuda.is_available() returns true

25%20PM

Previously it didn’t… so far so good…
Now I face another error

and a long stack trace that ends with

When compiling pytorch there is one step I skipped the installation of: NVIDIA cuDNN v6.x or above

Leads you here


then here
https://developer.nvidia.com/rdp/cudnn-download
and the direct link to the file…
https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.1.4/prod/9.2_20180516/cudnn-9.2-osx-x64-v7.1
And the installation instructions very important for MacOSX
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-mac

After verifying

To Recap:

Cuda Development Tool Kit installed on my computer at /Developer/NVIDIA/CUDA-9.2/

43%20PM

Then my hope is recompiling again pytorch after install NVIDIA cuDNN…

UPDATE2

I finished compiling pytorch without errors, but when executing these lines my computer becomes unresponsive ahhh!!! I had to force reboot: COMMAND, CONTROL and POWER buttons.

I have to research this explicit pytorch message…

Also prior to compile pytorch I set these variables:

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$CUDA_HOME/lib"
export PATH="$CUDA_HOME/bin:$PATH"

And this was the line to compile:

MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ python setup.py install

Changed the target OSX

I’m reviewing the output of the compilation to see if I’m able to recognize any error, warning message worth of research or report… :confused:

This is the log, if you want to check out with me. :confused:

pytorchWithCudaInstallationOutput.pdf (104.7 KB)

5 Likes

I have a be quiet dark base 900 case and would recommend it. Amazingly quiet. My journey with it and builds can be seen here.

I am looking at moving my non standard sized motherboard back into the dark base case and adding waterblocks for the gpus- finding the current air cooled rig a little noisy

1 Like

I’ve managed to get fast.ai working with my setup (Mac pro 2010, NVidia GTX 980, NVidia web driver v378.05.05.25f09, OSX version 10.12.6).

Steps are (assuming Anaconda is installed or Miniconda):

  1. Install NVidia Cuda 9.0 driver/toolkit and samples. Results in driver v9.0.197 showing up in system preferences. Follow instructions here

  2. Install/Downgrade to XCode 8.33 (required to build the samples to test CUDA is working).

  3. Build 1_Utilities/deviceQuery and run the sample, ensuring no errors are reported and GPU is detected. There is no point in continuing if you can’t get the samples to run.

  4. Install Cudnn-9.0

  5. Add /usr/local/cuda/bin to PATH and /usr/local/cuda/lib to DYND_LIBRARY_PATH

  6. Clone the pytorch repo and check out the v0.4.0 release

  7. Build pytorch by doing the following in a terminal
    7.1. Create a fastai virtual environment from miniconda (conda env create --name fastai)
    7.2. Activate fastai virtual environment (source activate fastai), make note of $PATH.
    7.3. Run: export CMAKE_PREFIX_PATH=[anaconda root directory], where anaconda root directory is the fastai virtual env bin dir in $PATH (e.g. /Users/blah/miniconda3/envs/fastai/bin).
    7.4. Run: conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
    7.5. Run: MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
    7.6. If CUDA is setup correctly, you should see it building CUDA related files from 16% onwards.

  8. Clone fast.ai repo from https://github.com/fastai/fastai/

  9. From fast.ai root dir, edit environment.yml and removed the following entries:
    – pytorch
    – cuda90
    – pytorch <= 0.4.0 check

  10. Run “source env update”, to install all the required dependencies, using the fastai virtual env created earlier.

  11. Thats it, took roughly 20 seconds to run the initial Resnet34 pretrained model from lesson 1.

3 Likes

Awesome work fighting to get this to work Jose, mitsujin and all! I’m considering getting an external gpu for my macbook to run fastai locally. I’m not sure if I’ll dare yet but this gave me some hope it can be done :-).

Has anyone been able to follow mitsujin steps and repeat the success?

You can definitely make it work but expect some fiddeling with the graphics driver. The NVidia web driver has to match exactly the version of the OS you are running. This can be an issue if you install e.g. a minor system update before NVidia releases a new web driver. In that case, the web driver will be disabled and you will have to wait until an update is available - happened to me when my macOS updated from 10.13.2 to 10.13.3.

Also your mileage regarding usefulness may vary depending on your HW configuration. My setup was a Mid 2013 MacBookPro with a NVidia 650M. Pytorch/Fast.ai was using GPU acceleration but you needed very small batch sizes to avoid GPU out of memory errors. What made it worse was that the whole system was completely unresponsive while running any kind of larger training until the training was done. This made it very impractical to use.

On a somewhat related note, I’m following the progress of AMDs ROCm initiative, hoping that eventually I can use my current Mac with AMD card for training. I read recently that the ROCm pytorch fork is working but I haven’t had a chance to try this out yet. Repo is here: https://github.com/ROCmSoftwarePlatform/pytorch

1 Like

I’m another success story with a Mac Mini 10.13.6, 2.6ghz i5, 16GB and external eGPU with a 1080TI 11GB.

Pros: it works and it’s reasonably performant.

Cons: You’re stuck at 10.13 (no CUDA for Mojave…)
With my setup i’m HIGHLY CPU/DISK bound now.
Upgrading to fastai v1 is a little daunting, i think it’s mostly getting pytorch v1 to compile but i’m not sure. I may make a new conda env and give it a try.

I may also just give up and see if i can push to get some more hardware.