Can use this place if anyone is trying to setup their own GPU with the current fastai and pytorch library
I am currently using GeForce 1080 ti GPU and have successfully installed the fastai package. But it is not able to recognise the nvidia/cuda drivers. So not using gpu to run the code and relying on cpu making it very slow. I am using cuda 9.2 with cudnn 7.3.1. Getting the following error when trying to access gpu.
A quick search over google is saying that nvidia driver version should be 396 whereas mine currently is 390. Can anyone help with how to update from 390 to 396?
Did you try reinstalling drivers? I faced a similar problem on paperspace, it was not using gpu to train. I reinstalled the drivers and restarted the machine and it worked just fine.
apt-get install cuda-drivers
Was able to solve this issue by updating to 396 from 390. The following resources have been very helpful for resolving this. Hoping this may help others in the future.
- https://docs-dev.fast.ai/troubleshoot#correctly-configured-nvidia-drivers - Overall steps to be followed but not complete as we need to figure out the specific driver installation part based on the platform being used(OS)
- https://medium.com/@philliplakis/removing-nvidia-drivers-on-centos-7-for-upgrading-bf00a2a5f0df - Mostly followed the instructions in this blog for the installation. (There is a catch though - the uninstallation of old drivers(390) was not working as it couldn’t recognise thru the uninstall command though nvidia-smi was working with 390 by then). we went ahead and tried installing 396 anyways and it worked. And answered yes to all the prompts in between (the dkms kernel builds etc)
- https://www.kinetica.com/docs/install/nvidia_rhel.html - Can fallback to this if above blog is not helpful
Thanks @shriram.jaju for the help
Lesson 1 Discussion ✅
i am facing same issue:
how to resolve it,
@raghavab1992 can you please describe the steps which you followed
I have followed the steps described below and it worked for me.
git clone https://github.com/fastai/course-v3 cd course-v3
create an environment: https://conda.io/docs/user-guide/install/linux.html
conda create -n fastai python=3.6
activate fastai environment
source activate fastai
install the required packages: https://github.com/fastai/fastai/blob/master/README.md
conda install -c pytorch pytorch-nightly cuda92 conda install -c fastai torchvision-nightly conda install -c fastai fastai
test it by starting the jupyter notebook
To update the fastai library follow the steps:
cd course-v3 git pull conda update conda -y source activate fastai conda update -c fastai fastai or (pip install fastai --upgrade )
@muhajir the steps listed will work if the nvidia drivers version is 396.xx+. So need to update nvidia drivers on top of the steps listed
thanks for listing down all steps, i followed all but no success, currently i am using cuda 8.0 version
I have gone through the steps, i am not able to open 1st link that you have mentioned,
i am using driver version: 387.34 and cuda 8.0
I would un-install every thing, install newest nvidia drivers, then install everything according to guide.
pytorch comes packaged with cuda by default so system cuda need not be changed…try updating the nvidia drivers to 396.xx+ for the OS ur using(…that will fix the issue…
Which version of pytorch comes with cuda 10? I’d really like to leverage tensor cores and fp16. Thanks.
Sorry for late post,
I have doubt about version compatibility of cuda 8.0 with nvidia driver version.
Is cuda 8.0 will support nvidia driver version 396.xx+?
do i need to install cuda 9.0 for nvidia drivers version 396.xx+?
currently i am following this post:
till now i have uninstall nvidia-driver 387.34 on cuda 8.0. and now the next step is to install 396.18 on cuda 8.0.
os used: ubuntu 16.04
@shwetap7 forget about cuda for now, just upgrade nvidia drivers to 396+…pytorch comes bundled with cuda inherently unlike tensorflow so fastai as it sits on top of pytorch should work
thanks will try that…
Hi, could you tell me exact 396 version you are using? I mean what is xx in 396.xx?
Thanks for help.
I am getting Floating point exception (core dumped)
on running this in lesson 1:
interp = ClassificationInterpretation.from_learner(learn)
I tried it with cuda 9.0 and cudnn 9.1
also with cuda 9.2 and cudnn 7.4
Didn’t got the issue