Fastai v1 install issues thread

Mates, it is not strictly an “install issue” so I posted in another section, but my fastai v1 (sucessful) installation refuses to work properly.

This is the thread: Python refuses abiding by SIGKILL

Maybe the maintainers could take a look.

Hi @stas ,

In the installation instruction you write:

However, note, that you most likely will need 396.xx+ driver for pytorch built with cuda92. For older drivers you will probably need to install pytorch with cuda90 or ever earlier.

Is the 396.xx driver a hard requirement for PyTorch 1.0 and/or fastai-v1 (used for the latest course)?

My machine currently has an NVIDIA-SMI 367.106 driver with CUDA 8.0. Can I install pytorch-nightly with the lastest fastai with my old 367 driver?

Didn’t want to mess up my current working driver setup. So paused updating the driver until I got some guidance. I have wasted days historically getting the drivers to install correctly and create separate working environments for fastai and keras.

Any guidance on if 396 driver is a hard requirement is appreciated.

I did a bit of googling and this seems to be the requirements for the 3 cuda versions pytorch currently supports in binary form:

CUDA 8.0 requires NVIDIA 361+
CUDA 9.0 requires NVIDIA 384+
CUDA 9.2 requires NVIDIA 396+

So yes, you should be able to use it. Just change the install instruction to:

conda install -c pytorch pytorch-nightly cuda80

or:

pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu80/torch_nightly.html

update: I documented the requirements here.

update: I found the authoritative source with all of it, including cuda 10!

CUDA Toolkit Linux x86_64 Driver Version Windows x86_64 Driver Version
CUDA 10.0.130 >= 410.48 >= 411.31
CUDA 9.2 (9.2.148 Update 1) >= 396.37 >= 398.26
CUDA 9.2 (9.2.88) >= 396.26 >= 397.44
CUDA 9.1 (9.1.85) >= 390.46 >= 391.29
CUDA 9.0 (9.0.76) >= 384.81 >= 385.54
CUDA 8.0 (8.0.61 GA2) >= 375.26 >= 376.51
CUDA 8.0 (8.0.44) >= 367.48 >= 369.30
CUDA 7.5 (7.5.16) >= 352.31 >= 353.66
CUDA 7.0 (7.0.28) >= 346.46 >= 347.62
2 Likes

Perfect, thanks!

Thank you for the installation trouble-shooting documentation. This is the best doc that I have come across for understanding GPU driver/CUDA/Cudnn/PyTorch/Fastai installation requirements. 10x better than nVidia’s own documentation :slight_smile:

1 Like

Hello, does anyone succeed to install and use v1 in Colab? I tried to install the dependencies by using

!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html
!pip install fastai

At first, I thought this will work because I can import fastai. But when I tried to do some training, this error appeared:

RuntimeError: DataLoader worker (pid 240) is killed by signal: Bus error.

This was weird because when I checked this error it’s probably caused by limited shared memory. But I immediately check GPU memory and it was barely used (<1 GB from 12 GB space). Is there something that I need to do first in order to use it in Colab?

This has been asked and answered many times, here for instance. The short answer is Colab doesn’t support pytorch v1 yet so it doesn’t support fastai v1.

Some users still have an issue with conda refusing to update/install beyond fastai-1.0.6, even after reverting numpy pin to what it was from day 0. If you’re one of them please help us diagnose the problem so that we could fix it.

To explain the problem: If you previously installed some conda package foo that for example requires spacy <=2.0.15, and now you’re trying to install fastai==1.0.12 which requires spacy==2.0.16 there is a conflict, and conda will search older fastai packages until it finds one where the dependencies don’t conflict. And fastai-1.0.6 happens to be that. So we need to find that package that interferes.

The only difference in conda dependencies that changed between 1.0.6 and 1.0.12 is:

- spacy ==2.0.16
- regex ==2018.8.29
- thinc ==6.12.0
- cymem ==2.0.2
- pyyaml

Could those who have this problem try to install these specific versions on your setup? i.e.:

conda install spacy==2.0.16
conda install regex==2018.8.29
conda install thinc==6.12.0
conda install cymem==2.0.2
conda install pyyaml

if any of these fails (i.e. conda refuses to install that package, or says it doesn’t have it), then rerun the same command after adding --debug to it. It will produce a lot of output. Please paste it on https://pastebin.com/ and send the link here. So for example:

conda install spacy==2.0.16 --debug >& out

and paste the contents of out to pastebin.

Thank you.

There is also a very simple solution. Create a fresh conda environment and don’t install anything there other than fastai and its requirements, and keep it that way. You will not have any such conflicts in the future.

pinging: @shoof, @jakcycsl since you reported this issue originally.

1 Like

@stas I think providing the following info would help with troubleshooting:

  1. conda version
  2. python version used in conda
  3. OS version

The problem I had was related to not able to get fastai > 1.0.6 when using conda 4.3.0 on Python 3.6, Ubuntu 18.04. Initially I tried to pip install the git repo directly to get the latest 1.0.12dev and it worked. I then followed your suggestion to update my conda to the latest version possible (4.5.11) for Python 3.6, and then I was able to update fastai to 1.0.11 (not 1.0.12). Both 1.0.11 and 1.0.12dev had no issues so far but I am using 1.0.11 now.

1 Like

That’s a very useful feedback, @shoof. Thank you.

So you’re saying that it was sufficient to update your conda to the latest and the problem was fixed.

And unlike anaconda, requiring a certain conda version should work for both miniconda and anaconda users, correct?

@jeremy, you earlier said you didn’t think we should pin min versions for pip and conda, but it looks like we may need to in this case, since most likely there is something in the older conda versions that sets pins that conflict with our dependencies. I certainly traced it to the anaconda package wrt lower numpy pin, which probably had in its bundle packages that pinned to a low version of numpy.

Unlike other packages I believe pinning install managers to higher versions is goodness and not enforcing on users anything weird (>= of course, and not ==). Especially, since if we don’t - we will have to require users to use their virtual env exclusively for fastai in order to avoid pin conflicts with any other package out there which might decided to pin down something we need a higher version of. So to me asking for the latest (or almost so) package manager for pypi and conda is a reasonable thing to do. That was also the reason why I bumped the pip requirement (but reverted since then)

But I’m also fine with not enforcing that, and just have the docs say - must update your pip/conda first before installing fastai, or do so if you have a problem - we already say that - except who reads the docs…

Yes.

My miniconda setup works. I suppose Anaconda should work too. If people use different Python versions, their conda versions may not update to the latest one (5.3.0 on Python 3.7 for now?) but so far Pyton 3.6 + miniconda 4.5.11 + Ubuntu 18.04 have no issues.

If for some reason people cannot update conda to the latest verison, a quick and dirty solution is to pip install from the github repo directly
pip install git+https://github.com/fastai/fastai/

This approach does not require conda update at all. I think it may not be sustainable throughout the course and may cause more issues with different setups though.

1 Like

Well, I realize my comment implied I was suggesting to always ask for the latest, but to clarify I only meant a newer version that we know works for all.

And yes, it works just fine on anaconda.

Thanks again.

1 Like

Not quite what I said (or at least not what I meant). :wink:

I think we shouldn’t ever pin things unless we explicitly know that a particular version is required to have fastai work correctly. I have a feeling however that pinning conda itself doesn’t work. A quick bit of googling suggests that may be the case.

1 Like

ok, so you’re making a critical distinction between “working correctly” and “automatically installing correctly”. So if the latter is not a requirement and can be documented instead then I understand your position better now. All is good then.

I will add a note not to pin or up the pins unless fastai can’t work correctly w/o it.

Thank you.

1 Like

I would certainly like that too - but unfortunately I don’t think you can automatically update conda using conda itself. (I would love to be wrong on that, since apparently we do need a more up to date version.)

1 Like

There seems to be differences in the install instructions on the docs page and the github readme page.

From the docs page:

conda install -c pytorch -c fastai fastai pytorch-nightly cuda92

From the github page:

conda install -c pytorch pytorch-nightly cuda92
conda install -c fastai torchvision-nightly
conda install -c fastai fastai

Is this intentional? Which would we follow? I’m currently going by the github page.

The github page is better. The docs page is a briefer version that saves space on the page, but makes debugging harder if something goes wrong.

I am having the same problem–after updating the nVidia driver. The driver itself works, and nvidia-smi sees it, but fastai does not find the GPU (because pytorch does not find it). I do not have a solution yet but I have a couple observations which might be useful.

The part that is failing seems to be pytorch loading module _C, and the error message is, as you indicate, that libcuda.so.1 is not found.

Here is the output of ldd run on that particular module:

todd@build ~ $ ldd ./anaconda3/envs/foo/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
	linux-vdso.so.1 =>  (0x00007ffeae4d7000)
	libshm.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libshm.so (0x00007f77c35b5000)
	libcudart-72ec04ea.so.9.2 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libcudart-72ec04ea.so.9.2 (0x00007f77c3348000)
	libnvToolsExt-3965bdd0.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libnvToolsExt-3965bdd0.so.1 (0x00007f77c313d000)
	libcaffe2.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libcaffe2.so (0x00007f77c0c67000)
	libc10.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libc10.so (0x00007f77c0a52000)
	libcaffe2_gpu.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so (0x00007f779482b000)
	libtorch.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libtorch.so.1 (0x00007f7793bd8000)
	libgcc_s.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/../../../libgcc_s.so.1 (0x00007f7793bc3000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7793982000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f77935b8000)
	/lib64/ld-linux-x86-64.so.2 (0x0000561479493000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f77933af000)
	libstdc++.so.6 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6 (0x00007f779326e000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7792f65000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7792d60000)
	libmkl_intel_lp64.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libmkl_intel_lp64.so (0x00007f779222f000)
	libmkl_gnu_thread.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libmkl_gnu_thread.so (0x00007f77909f6000)
	libmkl_core.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libmkl_core.so (0x00007f778c8bc000)
	libcuda.so.1 => not found
	libnvrtc-3fc78a74.so.9.2 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libnvrtc-3fc78a74.so.9.2 (0x00007f778b2ad000)
	libgomp.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libgomp.so.1 (0x00007f778b286000)
	libcuda.so.1 => not found

As you can see, libcuda.so.1 is being referenced there, but it does not resolve successfully on my system (because it doesn’t exist after the re-install of the nVidia driver I guess). Note that there is also a reference to the cuda runtime 9.2, and that resolves successfully–it is part of the local pytorch installation.

I did the same ldd test on the official image on GCP, and in that one there is also a reference to libcuda.so.1, but that resolves because it exists on that image (in /usr/lib/x86_64-linux-gnu).

So, what I currently believe is that the pytorch bundled in the fastai library does indeed use its own internal cuda 9.2, but in addition it requires a libcuda.so.1, probably for some auxiliary purpose separate from its main use of cuda.

That’s where I’ll leave it for this evening.

Follow up: I got my local pytorch and fastai to work after updating the nVidia driver by installing cuda as follows:

sudo apt-get install nvidia-cuda-toolkit

Apparently, pytorch needs both its own internal cuda library, as well as a system libcuda.so.1 to be present, in order to get past initialization.

Perhaps your installation is broken or incomplete? If you follow the apt path, nvidia-396 apt package should have installed libcuda1-396, which includes libcuda1.so.1:

$ dpkg -S  libcuda.so.1
libcuda1-396: /usr/lib/i386-linux-gnu/libcuda.so.1
libcuda1-396: /usr/lib/x86_64-linux-gnu/libcuda.so.1

$ apt list libcuda1-396
Listing... Done
libcuda1-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

$ apt-cache rdepends libcuda1-396 | grep nvidia
  nvidia-396

$ apt list nvidia-396
  nvidia-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

of course change 396 to whatever version you’re using.

The first command finds the apt package the file belongs to, 2nd checks whether that package is installed, 3rd which packages need that package, 4th that the parent apt package needing this package is installed. So basically if you were to apt install nvidia-396 you would have libcuda1-396 installed and you’d have libcuda.so.1 installed.

This is on Ubuntu-18.04.

Now documented here:
https://docs-dev.fast.ai/troubleshoot.html#libcudaso1-cannot-open-shared-object-file

@tdoucet, can you please send the output of this on the setup you had the issue with:

python -c 'import fastai; fastai.show_install(0)'

and also if you possible, how you installed the nvidia drivers: manually or via apt - and if the latter what was the command.