Setting up GPU for fastai_v3

Can use this place if anyone is trying to setup their own GPU with the current fastai and pytorch library

Hi,
I am currently using GeForce 1080 ti GPU and have successfully installed the fastai package. But it is not able to recognise the nvidia/cuda drivers. So not using gpu to run the code and relying on cpu making it very slow. I am using cuda 9.2 with cudnn 7.3.1. Getting the following error when trying to access gpu.

A quick search over google is saying that nvidia driver version should be 396 whereas mine currently is 390. Can anyone help with how to update from 390 to 396?

Did you try reinstalling drivers? I faced a similar problem on paperspace, it was not using gpu to train. I reinstalled the drivers and restarted the machine and it worked just fine.

apt-get install cuda-drivers

2 Likes

Hi All,
Was able to solve this issue by updating to 396 from 390. The following resources have been very helpful for resolving this. Hoping this may help others in the future.

  1. https://docs-dev.fast.ai/troubleshoot#correctly-configured-nvidia-drivers - Overall steps to be followed but not complete as we need to figure out the specific driver installation part based on the platform being used(OS)
  2. https://medium.com/@philliplakis/removing-nvidia-drivers-on-centos-7-for-upgrading-bf00a2a5f0df - Mostly followed the instructions in this blog for the installation. (There is a catch though - the uninstallation of old drivers(390) was not working as it couldn’t recognise thru the uninstall command though nvidia-smi was working with 390 by then). we went ahead and tried installing 396 anyways and it worked. And answered yes to all the prompts in between (the dkms kernel builds etc)
  3. https://www.kinetica.com/docs/install/nvidia_rhel.html - Can fallback to this if above blog is not helpful

Thanks @shriram.jaju for the help

4 Likes

Hi all,
i am facing same issue:

how to resolve it,
@raghavab1992 can you please describe the steps which you followed

@shwetap7 did u try the steps listed in above post

I have followed the steps described below and it worked for me.

  1. git clone https://github.com/fastai/course-v3
    cd course-v3
    
  2. create an environment: https://conda.io/docs/user-guide/install/linux.html

    conda create -n fastai python=3.6  
    
  3. activate fastai environment

    source activate fastai 
    
  4. install the required packages: https://github.com/fastai/fastai/blob/master/README.md

    conda install -c pytorch pytorch-nightly cuda92
    conda install -c fastai torchvision-nightly
    conda install -c fastai fastai
    
  5. test it by starting the jupyter notebook

    jupyter notebook
    

To update the fastai library follow the steps:

cd course-v3
git pull
conda update conda -y
source activate fastai 
conda update -c fastai fastai or (pip install fastai --upgrade )
3 Likes

@muhajir the steps listed will work if the nvidia drivers version is 396.xx+. So need to update nvidia drivers on top of the steps listed

thanks for listing down all steps, i followed all but no success, currently i am using cuda 8.0 version

Untitled2

I have gone through the steps, i am not able to open 1st link that you have mentioned,
i am using driver version: 387.34 and cuda 8.0

I would un-install every thing, install newest nvidia drivers, then install everything according to guide.

pytorch comes packaged with cuda by default so system cuda need not be changed…try updating the nvidia drivers to 396.xx+ for the OS ur using(…that will fix the issue…

1 Like

Which version of pytorch comes with cuda 10? I’d really like to leverage tensor cores and fp16. Thanks.

1 Like

Hi Raghav,
Sorry for late post,

I have doubt about version compatibility of cuda 8.0 with nvidia driver version.
Is cuda 8.0 will support nvidia driver version 396.xx+?
or
do i need to install cuda 9.0 for nvidia drivers version 396.xx+?

currently i am following this post:

steps:
till now i have uninstall nvidia-driver 387.34 on cuda 8.0. and now the next step is to install 396.18 on cuda 8.0.
os used: ubuntu 16.04

@shwetap7 forget about cuda for now, just upgrade nvidia drivers to 396+…pytorch comes bundled with cuda inherently unlike tensorflow so fastai as it sits on top of pytorch should work

thanks will try that…

Hi, could you tell me exact 396 version you are using? I mean what is xx in 396.xx?

@ymittal23 the exact version im currently using is 396.54

Thanks for help.

I am getting Floating point exception (core dumped)
on running this in lesson 1:

interp = ClassificationInterpretation.from_learner(learn)

I tried it with cuda 9.0 and cudnn 9.1
also with cuda 9.2 and cudnn 7.4
Didn’t got the issue