Fastai v1 install issues thread

Follow up: I got my local pytorch and fastai to work after updating the nVidia driver by installing cuda as follows:

sudo apt-get install nvidia-cuda-toolkit

Apparently, pytorch needs both its own internal cuda library, as well as a system libcuda.so.1 to be present, in order to get past initialization.

Perhaps your installation is broken or incomplete? If you follow the apt path, nvidia-396 apt package should have installed libcuda1-396, which includes libcuda1.so.1:

$ dpkg -S  libcuda.so.1
libcuda1-396: /usr/lib/i386-linux-gnu/libcuda.so.1
libcuda1-396: /usr/lib/x86_64-linux-gnu/libcuda.so.1

$ apt list libcuda1-396
Listing... Done
libcuda1-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

$ apt-cache rdepends libcuda1-396 | grep nvidia
  nvidia-396

$ apt list nvidia-396
  nvidia-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

of course change 396 to whatever version youā€™re using.

The first command finds the apt package the file belongs to, 2nd checks whether that package is installed, 3rd which packages need that package, 4th that the parent apt package needing this package is installed. So basically if you were to apt install nvidia-396 you would have libcuda1-396 installed and youā€™d have libcuda.so.1 installed.

This is on Ubuntu-18.04.

Now documented here:
https://docs-dev.fast.ai/troubleshoot.html#libcudaso1-cannot-open-shared-object-file

@tdoucet, can you please send the output of this on the setup you had the issue with:

python -c 'import fastai; fastai.show_install(0)'

and also if you possible, how you installed the nvidia drivers: manually or via apt - and if the latter what was the command.

The apt path you describe shows that libcuda1-396 depends on nvidia-396, but that does not necessarily mean that youā€™ll get libcuda1-396 when you install nvidia-396.

On my system, the nvidia-396 package ā€œrecommendsā€ libcuda1-396, but does not install it by default because the nvidia-396 package does not ā€œrequireā€ it. This is on Linux Mint 18.3. I believe this situation is consistent also with the listing you provided.

However, this does indicate that it should work to explicitly install the libcuda1-396 package, with

apt-get install libcuda1-396

and indeed that gets you a better version of cuda than what I tried before.

(For completeness, I used the ā€œ-410ā€ variants and not ā€œ-396ā€ but it probably doesnā€™t matter and I wanted to keep the example the same.)

So, to summarize, on my system, at least, I updated the nvidia driver, and also the recommended external cuda (required by pytorch for peripheral purposes) using the following commands, and that made it work for me:

sudo apt-get install nvidia-410
sudo apt-get install libcuda1-410

It is a little unfortunate that pytorch is almost, but not quite, independent of the system-installed cuda. It packages its own, but still needs an external one. This makes fastai have the same dependency. This is unfortunate, because there are potentially other clients of cuda that might have their own requirements for the system-installed cuda. For example, doing the above broke my working TensorFlow setup. Iā€™ll rebuild TensorFlow and can fix it, but this is an example of fastai (via pytorch) being not quite self-contained. I think for the final 1.0 version of the library it would be much better if it were.

1 Like

Youā€™re absolutely correct, @tdoucet. I approached it from the wrong direction.

apt-cache depends nvidia-396 | grep libcuda
  Recommends: libcuda1-396

So, yes, a bummer it is.

Strangely enough only 1 report on missing this library on pytorch forums. I guess they didnā€™t read fastaiā€™s recent recommendation to not needing to install CUDA system-wide. I agree that this is a very confusing situation. Let me research it some more.

Your input was very useful, Todd.

Update: itā€™s a build-bug in pytorch - it will get fixed in the Fridayā€™s nightly build. libcuda.so.1 shouldnā€™t be needed to be installed system-wide.

2 Likes

no worries. Thanks for letting me know.

Iā€™m using an account I have on a university server to do my fastai work, as it has a GPU (tesla p100) I can use. I was able to install fastai without a problem and Iā€™ve been using this setup for my coursework without an issue.
(To be specific: I ssh into my uni account, then I ssh from there onto a GPU node. In that GPU-node shell, I source activate my conda env, and try to do all my installing/updating in there.)

My account doesnā€™t have root permissions, however, so sometimes this causes issues when Iā€™m trying to update conda, fastai, etc. In particular, I canā€™t run conda update conda, or use conda to update fastai, so Iā€™ve been using pip install --upgrade fastai, even though Iā€™m working in a conda environment. But this has started causing issues, Iā€™ve gotten some errors about how conda is out of date when I try this - these started this week.

When I run conda update conda when Iā€™m not inside my conda virtual environment, I get this message:

CondaIOError: Missing write permissions in: /gpfs1/arch/x86_64-rhel7/anaconda3-5.0.1
#
# You don't appear to have the necessary permissions to update packages
# into the install area '/gpfs1/arch/x86_64-rhel7/anaconda3-5.0.1'.
# However you can clone this environment into your home directory and
# then make changes to it.
# This may be done using the command:
#
# $ conda create -n my_root --clone="/gpfs1/arch/x86_64-rhel7/anaconda3-5.0.1"

But when I activate my conda environment and try the same command, I get:

$ conda update conda

PackageNotInstalledError: Package is not installed in prefix.
  prefix: /users/a/r/areece1/.conda/envs/my_root_gpu
  package name: conda

I havenā€™t been able to figure out how to get my env to recognize that conda is available as a package.

I know this is sort of a bespoke problem, but hoping someone may be able to help me out here - Iā€™ve googled around for the errors Iā€™m getting, but all the answers Iā€™ve found seem to rely on the ability to exercise sudo (or they involve doing Linux-y things beyond my current level of understanding).

Iā€™m on RHEL 7.5, conda 4.3.30, Python 3.6.6. Thanks in advance!

Hi @agr I had similar issues when trying to setup the fastai library on a scientific computing cluster. After reading this post I ended up contacting a sys-admin to update the conda package manager in the the module I was loading to run fastai. Using conda 4.5.11 solved those issues for me.

Also, because I have little room in my home folder, Iā€™ve tried modifying my Config() file to download data and models elsewhere than my home directory. I now get errors in both the untar_data() and create_cnn() functions in the first notebook, where they just wonā€™t download the data or pretrained model and yet return objects (path, learner) which are not usable. Does anybody have experience with this?

Why is pytorch-nightly so slow to install with conda?

Hi All

I need your help, I have a Docker image that I set up - with fastai installed via pip

Note: Everything worked will until this morning when I tried to rebuild the docker image

When I run my notebook, and try to execute lesson 1 it complains that python dependencies cv2, seaborn and bcolz are not installed.

When I check setup.py, it doesnā€™t look like opencv and bcolz are defined as requirements.

Please help!

Apologies if this has been answered (could not find it).

Has anyone deployed a fastai model on a server (with dokku)? I need to build a requirements.txt file (no conda), but I donā€™t know how I can specify this in the file?

on the command line I can use on my Mac:

pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

and this works (gives me a cpu-only install)

My current requirements.txt file:

  fastai
  Flask
  matplotlib
  numpy
  pandas
  Pillow
  plotly
  torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

Wit this file I guess deployment errors:

remote: + [[ Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: ) == --* ]]
remote: + [[ Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: ) == ==* ]]
      ' 'Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: )'
       Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: )
remote: + read -r line
remote: + [[ No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1)) == --* ]]
remote: + [[ No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1)) == ==* ]]
      ' 'No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1))'
       No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1))

Fixed itā€¦

To answer my own question:

Seems like the -f --find-links option does not work for the nightlies (or Iā€™m using it wrong)ā€¦

This worked in my requirements.txt file:

numpy
torchvision_nightly
https://download.pytorch.org/whl/nightly/cpu/torch_nightly-1.0.0.dev20181105-cp37-cp37m-linux_x86_64.whl ; sys_plat    form == "linux"
fastai

Is torch 1.0.0.dev20181005 the right version? I canā€™t run course-v3 lesson2-sgd without changing the syntax.

Hi,

I tried in every possible way to install adequately the fastai library on my OSx local machine fortuitously. When I try to import it on the Jupyter Notebook, the fastai.imports library works just fine but not the fastai.structured library. I keep getting this annoying error: ModuleNotFoundError: No module named ā€˜fastai.structuredā€™

I tried the pip command, I tried following the installation instructions provided by the Github page of fastai but nothing worked out so far.

Thank you for your help!

Hello!

torch cuda is Not available

My machine: Paperspace (Fastai) Ubuntu 16.04 with P4000 GPU
I tried to insall with Conda:

conda create --name baseclone --clone base
source activate baseclone
conda update -n baseclone conda
conda update --all
conda install -c pytorch pytorch-nightly cuda92
conda install -c fastai torchvision-nightly
conda install -c fastai fastai

python -c ā€˜import fastai; fastai.show_install(1)ā€™

=== Software === 
python version : 3.6.6
fastai version : 1.0.21
torch version  : 1.0.0.dev20181108
torch cuda ver : 9.2.148
torch cuda is  : **Not available** 

=== Hardware === 
No GPUs available 

=== Environment === 
platform       : Linux-4.4.0-128-generic-x86_64-with-debian-stretch-sid
distro         : #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018
conda env      : baseclone
python         : /home/paperspace/anaconda3/envs/baseclone/bin/python
sys.path       : 
/home/paperspace/anaconda3/envs/baseclone/lib/python36.zip
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6/lib-dynload
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6/site-packages
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6/site-packages/IPython/extensions
no supported gpus found on this system

apt list | grep nvidia-3

nvidia-304/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-304-dev/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-304-updates/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-304-updates-dev/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-331/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-updates/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-updates-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-updates-uvm/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-uvm/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-updates/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-updates-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-updates-uvm/xenial 340.96-0ubuntu2 amd64
nvidia-340-uvm/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-346/xenial 352.63-0ubuntu3 amd64
nvidia-346-dev/xenial 352.63-0ubuntu3 amd64
nvidia-346-updates/xenial 352.63-0ubuntu3 amd64
nvidia-346-updates-dev/xenial 352.63-0ubuntu3 amd64
nvidia-352/xenial 361.42-0ubuntu2 i386
nvidia-352-dev/xenial 361.42-0ubuntu2 i386
nvidia-352-updates/xenial 361.42-0ubuntu2 i386
nvidia-352-updates-dev/xenial 361.42-0ubuntu2 i386
nvidia-361/xenial-updates,xenial-security 367.57-0ubuntu0.16.04.1 amd64
nvidia-361-dev/xenial-updates,xenial-security 367.57-0ubuntu0.16.04.1 amd64
nvidia-361-updates/xenial 361.42-0ubuntu2 i386
nvidia-361-updates-dev/xenial 361.42-0ubuntu2 i386
nvidia-367/xenial-updates,xenial-security 375.66-0ubuntu0.16.04.1 i386
nvidia-367-dev/xenial-updates,xenial-security 375.66-0ubuntu0.16.04.1 i386
nvidia-375/xenial-updates,xenial-security 384.130-0ubuntu0.16.04.1 i386
nvidia-375-dev/xenial-updates,xenial-security 384.130-0ubuntu0.16.04.1 i386
nvidia-375-diagnostic/unknown 375.88-0ubuntu1 amd64
nvidia-384/xenial-updates,xenial-security 384.130-0ubuntu0.16.04.1 i386
nvidia-384-dev/unknown 384.145-0ubuntu1 amd64
nvidia-384-diagnostic/unknown 384.145-0ubuntu1 amd64
nvidia-387/xenial 390.67-0ubuntu0~gpu16.04.1 amd64
nvidia-387-dev/xenial 390.67-0ubuntu0~gpu16.04.1 amd64
nvidia-390/xenial,now 390.67-0ubuntu0~gpu16.04.1 amd64 [residual-config]
nvidia-390-dev/xenial 390.67-0ubuntu0~gpu16.04.1 amd64
nvidia-390-diagnostic/unknown 390.30-0ubuntu1 amd64
nvidia-396/unknown 396.26-0ubuntu1 amd64
nvidia-396-dev/unknown 396.26-0ubuntu1 amd64
nvidia-396-diagnostic/unknown 396.26-0ubuntu1 amd64

I also tried to install fastai with PyPI:

pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
pip install fastai

And I tried to remove nvidia-387:

sudo apt-get ā€”purge remove nvidia-387
sudo apt-get -f install
sudo reboot now

but nothing works. GPU is not recognized by torch.
How can I fix it?

Thank you!

See: https://docs.fast.ai/troubleshoot.html#correctly-configured-nvidia-drivers

One thing for sure is wrong that youā€™re installing nvidia-387 and cuda92. You need cuda90 for 384 < nvidia < 396 - see the table at the end of the linked section above.

1 Like

I thought that I had the latest nvidia drivers installed but I was wrong.

After installing:

sudo apt purge nvidia-*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-396
sudo reboot

everything worked. Thank you!

2 Likes

Hi,
I am also facing the similar issue. Can you please suggest how to fix this issue.

PicklingError: Canā€™t pickle <function crop_pad at 0x000001F39C8A9A60>: itā€™s not the same object as fastai.vision.transform.crop_pad

Code:

data = (ImageItemList.from_folder(path/ā€˜trainā€™)
.random_split_by_pct(0.2)
.label_from_folder()
.transform(tfms, size=224,bs=32,padding_mode=ā€˜zerosā€™,num_workers=0)
.databunch())

I am using fastai 1.0.24 ,windows 10

Thanks,
Ritika

1 Like