Fastai v1 install issues thread

Hello, does anyone succeed to install and use v1 in Colab? I tried to install the dependencies by using

!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html
!pip install fastai

At first, I thought this will work because I can import fastai. But when I tried to do some training, this error appeared:

RuntimeError: DataLoader worker (pid 240) is killed by signal: Bus error.

This was weird because when I checked this error it’s probably caused by limited shared memory. But I immediately check GPU memory and it was barely used (<1 GB from 12 GB space). Is there something that I need to do first in order to use it in Colab?

This has been asked and answered many times, here for instance. The short answer is Colab doesn’t support pytorch v1 yet so it doesn’t support fastai v1.

Some users still have an issue with conda refusing to update/install beyond fastai-1.0.6, even after reverting numpy pin to what it was from day 0. If you’re one of them please help us diagnose the problem so that we could fix it.

To explain the problem: If you previously installed some conda package foo that for example requires spacy <=2.0.15, and now you’re trying to install fastai==1.0.12 which requires spacy==2.0.16 there is a conflict, and conda will search older fastai packages until it finds one where the dependencies don’t conflict. And fastai-1.0.6 happens to be that. So we need to find that package that interferes.

The only difference in conda dependencies that changed between 1.0.6 and 1.0.12 is:

- spacy ==2.0.16
- regex ==2018.8.29
- thinc ==6.12.0
- cymem ==2.0.2
- pyyaml

Could those who have this problem try to install these specific versions on your setup? i.e.:

conda install spacy==2.0.16
conda install regex==2018.8.29
conda install thinc==6.12.0
conda install cymem==2.0.2
conda install pyyaml

if any of these fails (i.e. conda refuses to install that package, or says it doesn’t have it), then rerun the same command after adding --debug to it. It will produce a lot of output. Please paste it on https://pastebin.com/ and send the link here. So for example:

conda install spacy==2.0.16 --debug >& out

and paste the contents of out to pastebin.

Thank you.

There is also a very simple solution. Create a fresh conda environment and don’t install anything there other than fastai and its requirements, and keep it that way. You will not have any such conflicts in the future.

pinging: @shoof, @jakcycsl since you reported this issue originally.

1 Like

@stas I think providing the following info would help with troubleshooting:

  1. conda version
  2. python version used in conda
  3. OS version

The problem I had was related to not able to get fastai > 1.0.6 when using conda 4.3.0 on Python 3.6, Ubuntu 18.04. Initially I tried to pip install the git repo directly to get the latest 1.0.12dev and it worked. I then followed your suggestion to update my conda to the latest version possible (4.5.11) for Python 3.6, and then I was able to update fastai to 1.0.11 (not 1.0.12). Both 1.0.11 and 1.0.12dev had no issues so far but I am using 1.0.11 now.

1 Like

That’s a very useful feedback, @shoof. Thank you.

So you’re saying that it was sufficient to update your conda to the latest and the problem was fixed.

And unlike anaconda, requiring a certain conda version should work for both miniconda and anaconda users, correct?

@jeremy, you earlier said you didn’t think we should pin min versions for pip and conda, but it looks like we may need to in this case, since most likely there is something in the older conda versions that sets pins that conflict with our dependencies. I certainly traced it to the anaconda package wrt lower numpy pin, which probably had in its bundle packages that pinned to a low version of numpy.

Unlike other packages I believe pinning install managers to higher versions is goodness and not enforcing on users anything weird (>= of course, and not ==). Especially, since if we don’t - we will have to require users to use their virtual env exclusively for fastai in order to avoid pin conflicts with any other package out there which might decided to pin down something we need a higher version of. So to me asking for the latest (or almost so) package manager for pypi and conda is a reasonable thing to do. That was also the reason why I bumped the pip requirement (but reverted since then)

But I’m also fine with not enforcing that, and just have the docs say - must update your pip/conda first before installing fastai, or do so if you have a problem - we already say that - except who reads the docs…

Yes.

My miniconda setup works. I suppose Anaconda should work too. If people use different Python versions, their conda versions may not update to the latest one (5.3.0 on Python 3.7 for now?) but so far Pyton 3.6 + miniconda 4.5.11 + Ubuntu 18.04 have no issues.

If for some reason people cannot update conda to the latest verison, a quick and dirty solution is to pip install from the github repo directly
pip install git+https://github.com/fastai/fastai/

This approach does not require conda update at all. I think it may not be sustainable throughout the course and may cause more issues with different setups though.

1 Like

Well, I realize my comment implied I was suggesting to always ask for the latest, but to clarify I only meant a newer version that we know works for all.

And yes, it works just fine on anaconda.

Thanks again.

1 Like

Not quite what I said (or at least not what I meant). :wink:

I think we shouldn’t ever pin things unless we explicitly know that a particular version is required to have fastai work correctly. I have a feeling however that pinning conda itself doesn’t work. A quick bit of googling suggests that may be the case.

1 Like

ok, so you’re making a critical distinction between “working correctly” and “automatically installing correctly”. So if the latter is not a requirement and can be documented instead then I understand your position better now. All is good then.

I will add a note not to pin or up the pins unless fastai can’t work correctly w/o it.

Thank you.

1 Like

I would certainly like that too - but unfortunately I don’t think you can automatically update conda using conda itself. (I would love to be wrong on that, since apparently we do need a more up to date version.)

1 Like

There seems to be differences in the install instructions on the docs page and the github readme page.

From the docs page:

conda install -c pytorch -c fastai fastai pytorch-nightly cuda92

From the github page:

conda install -c pytorch pytorch-nightly cuda92
conda install -c fastai torchvision-nightly
conda install -c fastai fastai

Is this intentional? Which would we follow? I’m currently going by the github page.

The github page is better. The docs page is a briefer version that saves space on the page, but makes debugging harder if something goes wrong.

I am having the same problem–after updating the nVidia driver. The driver itself works, and nvidia-smi sees it, but fastai does not find the GPU (because pytorch does not find it). I do not have a solution yet but I have a couple observations which might be useful.

The part that is failing seems to be pytorch loading module _C, and the error message is, as you indicate, that libcuda.so.1 is not found.

Here is the output of ldd run on that particular module:

todd@build ~ $ ldd ./anaconda3/envs/foo/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
	linux-vdso.so.1 =>  (0x00007ffeae4d7000)
	libshm.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libshm.so (0x00007f77c35b5000)
	libcudart-72ec04ea.so.9.2 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libcudart-72ec04ea.so.9.2 (0x00007f77c3348000)
	libnvToolsExt-3965bdd0.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libnvToolsExt-3965bdd0.so.1 (0x00007f77c313d000)
	libcaffe2.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libcaffe2.so (0x00007f77c0c67000)
	libc10.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libc10.so (0x00007f77c0a52000)
	libcaffe2_gpu.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so (0x00007f779482b000)
	libtorch.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libtorch.so.1 (0x00007f7793bd8000)
	libgcc_s.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/../../../libgcc_s.so.1 (0x00007f7793bc3000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7793982000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f77935b8000)
	/lib64/ld-linux-x86-64.so.2 (0x0000561479493000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f77933af000)
	libstdc++.so.6 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6 (0x00007f779326e000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7792f65000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7792d60000)
	libmkl_intel_lp64.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libmkl_intel_lp64.so (0x00007f779222f000)
	libmkl_gnu_thread.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libmkl_gnu_thread.so (0x00007f77909f6000)
	libmkl_core.so => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libmkl_core.so (0x00007f778c8bc000)
	libcuda.so.1 => not found
	libnvrtc-3fc78a74.so.9.2 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/libnvrtc-3fc78a74.so.9.2 (0x00007f778b2ad000)
	libgomp.so.1 => /home/todd/./anaconda3/envs/foo/lib/python3.6/site-packages/torch/lib/../../../../libgomp.so.1 (0x00007f778b286000)
	libcuda.so.1 => not found

As you can see, libcuda.so.1 is being referenced there, but it does not resolve successfully on my system (because it doesn’t exist after the re-install of the nVidia driver I guess). Note that there is also a reference to the cuda runtime 9.2, and that resolves successfully–it is part of the local pytorch installation.

I did the same ldd test on the official image on GCP, and in that one there is also a reference to libcuda.so.1, but that resolves because it exists on that image (in /usr/lib/x86_64-linux-gnu).

So, what I currently believe is that the pytorch bundled in the fastai library does indeed use its own internal cuda 9.2, but in addition it requires a libcuda.so.1, probably for some auxiliary purpose separate from its main use of cuda.

That’s where I’ll leave it for this evening.

Follow up: I got my local pytorch and fastai to work after updating the nVidia driver by installing cuda as follows:

sudo apt-get install nvidia-cuda-toolkit

Apparently, pytorch needs both its own internal cuda library, as well as a system libcuda.so.1 to be present, in order to get past initialization.

Perhaps your installation is broken or incomplete? If you follow the apt path, nvidia-396 apt package should have installed libcuda1-396, which includes libcuda1.so.1:

$ dpkg -S  libcuda.so.1
libcuda1-396: /usr/lib/i386-linux-gnu/libcuda.so.1
libcuda1-396: /usr/lib/x86_64-linux-gnu/libcuda.so.1

$ apt list libcuda1-396
Listing... Done
libcuda1-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

$ apt-cache rdepends libcuda1-396 | grep nvidia
  nvidia-396

$ apt list nvidia-396
  nvidia-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

of course change 396 to whatever version you’re using.

The first command finds the apt package the file belongs to, 2nd checks whether that package is installed, 3rd which packages need that package, 4th that the parent apt package needing this package is installed. So basically if you were to apt install nvidia-396 you would have libcuda1-396 installed and you’d have libcuda.so.1 installed.

This is on Ubuntu-18.04.

Now documented here:
https://docs-dev.fast.ai/troubleshoot.html#libcudaso1-cannot-open-shared-object-file

@tdoucet, can you please send the output of this on the setup you had the issue with:

python -c 'import fastai; fastai.show_install(0)'

and also if you possible, how you installed the nvidia drivers: manually or via apt - and if the latter what was the command.

The apt path you describe shows that libcuda1-396 depends on nvidia-396, but that does not necessarily mean that you’ll get libcuda1-396 when you install nvidia-396.

On my system, the nvidia-396 package “recommends” libcuda1-396, but does not install it by default because the nvidia-396 package does not “require” it. This is on Linux Mint 18.3. I believe this situation is consistent also with the listing you provided.

However, this does indicate that it should work to explicitly install the libcuda1-396 package, with

apt-get install libcuda1-396

and indeed that gets you a better version of cuda than what I tried before.

(For completeness, I used the “-410” variants and not “-396” but it probably doesn’t matter and I wanted to keep the example the same.)

So, to summarize, on my system, at least, I updated the nvidia driver, and also the recommended external cuda (required by pytorch for peripheral purposes) using the following commands, and that made it work for me:

sudo apt-get install nvidia-410
sudo apt-get install libcuda1-410

It is a little unfortunate that pytorch is almost, but not quite, independent of the system-installed cuda. It packages its own, but still needs an external one. This makes fastai have the same dependency. This is unfortunate, because there are potentially other clients of cuda that might have their own requirements for the system-installed cuda. For example, doing the above broke my working TensorFlow setup. I’ll rebuild TensorFlow and can fix it, but this is an example of fastai (via pytorch) being not quite self-contained. I think for the final 1.0 version of the library it would be much better if it were.

1 Like

You’re absolutely correct, @tdoucet. I approached it from the wrong direction.

apt-cache depends nvidia-396 | grep libcuda
  Recommends: libcuda1-396

So, yes, a bummer it is.

Strangely enough only 1 report on missing this library on pytorch forums. I guess they didn’t read fastai’s recent recommendation to not needing to install CUDA system-wide. I agree that this is a very confusing situation. Let me research it some more.

Your input was very useful, Todd.

Update: it’s a build-bug in pytorch - it will get fixed in the Friday’s nightly build. libcuda.so.1 shouldn’t be needed to be installed system-wide.

2 Likes

no worries. Thanks for letting me know.