XGBoost with GPU support

I was trying to compile xgboost on an AWS instance following the instructions here:

However, the first step of “building the shared library” has me stumped. My question is: under which
directory on Ubuntu do I find it? Even if I ignore this step, and go ahead and git recursively, I get a dist util related
error, which if I ignore, I am unable to import xgboost.

I didn’t have any trouble using pip install, but this does not allow me to take advantage of GPU support, which is what
I want to try out on the AWS instance. Has anybody tried to install XGboost on AWS (Ubuntu 16.04.3) with GPU support?

I just did this by following the “Build with GPU support” instructions on that page

On one test, an i7-5930k (6 core 3.5Gz) ran some code (small CV grid search) in 2 minutes using 12 threads, and the GPU completed in 1 minute.

By the way, I read that you can use multiple GPUs by building with NCCL, though I wasn’t able to get that to work yet. Here are the steps I followed to install XGBoost with GPU and NCCL in case anyone’s interested,

  1. Install NCCL

    • Follow Nvidia’s guide for 16.04. It doesn’t mention where to download one needed .deb file, so:
    • Download nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb from here, assuming you have 16.04
    • Complete the steps in the guide
  2. Follow the build instructions and add the USE_NCCL parameter

    • cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON
  3. In the code, supply the appropriate parameters as mentioned on this page and this issue

     params = {'tree_method':'gpu_hist', 'n_gpus': 2, 'eta': 0.01, ' subsample': 0.7,
                   'colsample_bytree': 0.8, 'max_depth':3, 'min_child_weight':3,
                   'objective': 'binary:logistic', 'seed':0} 
    
     cv_xgb = xgb.cv(params = params, dtrain = train_dmat,
                    num_boost_round = 3000, nfold = 5, metrics = ['error'],
                    early_stopping_rounds = 100)
    

If I set n_gpus to more than 1, I get this error:

Check failed: device_ordinals.size() == 1 (2 vs. 1) XGBoost must be compiled with NCCL to use more than one GPU

I did reinstall the python XGBoost tools from source after rebuilding XGBoost with NCCL turned on.

By the way, I found this guide/walkthrough to be a pretty good introduction to XG-Boost

I was able to build with no problem, but had the hardest time getting Jupyter Notebook to use the GPU-enabled version. I finally found a solution: after building, append the path/to/xgboost/python-package to sys.path. Full details in this gist.

PS I also found if I put export PYTHONPATH=/path/to/xgboost/python-package:$PYTHONPATH in my .bashrc file, then I don’t need to append it to sys path in each new notebook.

1 Like

I found a bug in xgboost gpu support, which has since been fixed. So I re-built xgboost on my gpu, and re-ran the bug-reproduction script from python in the terminal. No bug.

Then I did it from jupyter notebook, after restarting my computer completely. Bug remained.

So it seems that jupyter notebook is still using an old executable? But I don’t see how that’s possible - I don’t even see where the old executable could be on my system.

Doing some googling I found lots of cases where an import statement couldn’t find a package from jupyter but could from the terminal (example), but nothing where jupyter wasn’t updating to a new .so file.

UPDATE: It’s not jupyter or conda; the bug remains when running from the terminal after all. It turns out to be an intermittent problem.

FURTHER UPDATE: The xgb bug appears to be fixed now (3 April 2018)

1 Like

Did anyone try to build xgboost or LightGBM on a paperspace machine with gpu support?

In retrospect, all I did to get this to work was to follow directions. The docs give 4 different possible ways to make sure python uses the gpu-enabled xgboost. When I was first approaching this, I read too quickly, and thought they were steps 1-4, rather than options 1-4. Turns out that the thing that ended up working for me was option 2.

1 Like