Fastai v1 install issues thread

tdoucet · October 27, 2018, 5:11pm

Follow up: I got my local pytorch and fastai to work after updating the nVidia driver by installing cuda as follows:

sudo apt-get install nvidia-cuda-toolkit

Apparently, pytorch needs both its own internal cuda library, as well as a system libcuda.so.1 to be present, in order to get past initialization.

stas · October 27, 2018, 6:05pm

Perhaps your installation is broken or incomplete? If you follow the apt path, nvidia-396 apt package should have installed libcuda1-396, which includes libcuda1.so.1:

$ dpkg -S  libcuda.so.1
libcuda1-396: /usr/lib/i386-linux-gnu/libcuda.so.1
libcuda1-396: /usr/lib/x86_64-linux-gnu/libcuda.so.1

$ apt list libcuda1-396
Listing... Done
libcuda1-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

$ apt-cache rdepends libcuda1-396 | grep nvidia
  nvidia-396

$ apt list nvidia-396
  nvidia-396/unknown,now 396.44-0ubuntu1 amd64 [installed,automatic]

of course change 396 to whatever version you’re using.

The first command finds the apt package the file belongs to, 2nd checks whether that package is installed, 3rd which packages need that package, 4th that the parent apt package needing this package is installed. So basically if you were to apt install nvidia-396 you would have libcuda1-396 installed and you’d have libcuda.so.1 installed.

This is on Ubuntu-18.04.

Now documented here:
https://docs-dev.fast.ai/troubleshoot.html#libcudaso1-cannot-open-shared-object-file

stas · October 27, 2018, 7:12pm

@tdoucet, can you please send the output of this on the setup you had the issue with:

python -c 'import fastai; fastai.show_install(0)'

and also if you possible, how you installed the nvidia drivers: manually or via apt - and if the latter what was the command.

tdoucet · October 27, 2018, 7:55pm

The apt path you describe shows that libcuda1-396 depends on nvidia-396, but that does not necessarily mean that you’ll get libcuda1-396 when you install nvidia-396.

On my system, the nvidia-396 package “recommends” libcuda1-396, but does not install it by default because the nvidia-396 package does not “require” it. This is on Linux Mint 18.3. I believe this situation is consistent also with the listing you provided.

However, this does indicate that it should work to explicitly install the libcuda1-396 package, with

apt-get install libcuda1-396

and indeed that gets you a better version of cuda than what I tried before.

(For completeness, I used the “-410” variants and not “-396” but it probably doesn’t matter and I wanted to keep the example the same.)

So, to summarize, on my system, at least, I updated the nvidia driver, and also the recommended external cuda (required by pytorch for peripheral purposes) using the following commands, and that made it work for me:

sudo apt-get install nvidia-410
sudo apt-get install libcuda1-410

It is a little unfortunate that pytorch is almost, but not quite, independent of the system-installed cuda. It packages its own, but still needs an external one. This makes fastai have the same dependency. This is unfortunate, because there are potentially other clients of cuda that might have their own requirements for the system-installed cuda. For example, doing the above broke my working TensorFlow setup. I’ll rebuild TensorFlow and can fix it, but this is an example of fastai (via pytorch) being not quite self-contained. I think for the final 1.0 version of the library it would be much better if it were.

stas · October 27, 2018, 8:25pm

You’re absolutely correct, @tdoucet. I approached it from the wrong direction.

apt-cache depends nvidia-396 | grep libcuda
  Recommends: libcuda1-396

So, yes, a bummer it is.

Strangely enough only 1 report on missing this library on pytorch forums. I guess they didn’t read fastai’s recent recommendation to not needing to install CUDA system-wide. I agree that this is a very confusing situation. Let me research it some more.

Your input was very useful, Todd.

stas · October 30, 2018, 5:35am

Update: it’s a build-bug in pytorch - it will get fixed in the Friday’s nightly build. libcuda.so.1 shouldn’t be needed to be installed system-wide.

techjoey · October 31, 2018, 12:41am

no worries. Thanks for letting me know.

agr · November 2, 2018, 8:15pm

I’m using an account I have on a university server to do my fastai work, as it has a GPU (tesla p100) I can use. I was able to install fastai without a problem and I’ve been using this setup for my coursework without an issue.
(To be specific: I ssh into my uni account, then I ssh from there onto a GPU node. In that GPU-node shell, I source activate my conda env, and try to do all my installing/updating in there.)

My account doesn’t have root permissions, however, so sometimes this causes issues when I’m trying to update conda, fastai, etc. In particular, I can’t run conda update conda, or use conda to update fastai, so I’ve been using pip install --upgrade fastai, even though I’m working in a conda environment. But this has started causing issues, I’ve gotten some errors about how conda is out of date when I try this - these started this week.

When I run conda update conda when I’m not inside my conda virtual environment, I get this message:

CondaIOError: Missing write permissions in: /gpfs1/arch/x86_64-rhel7/anaconda3-5.0.1
#
# You don't appear to have the necessary permissions to update packages
# into the install area '/gpfs1/arch/x86_64-rhel7/anaconda3-5.0.1'.
# However you can clone this environment into your home directory and
# then make changes to it.
# This may be done using the command:
#
# $ conda create -n my_root --clone="/gpfs1/arch/x86_64-rhel7/anaconda3-5.0.1"

But when I activate my conda environment and try the same command, I get:

$ conda update conda

PackageNotInstalledError: Package is not installed in prefix.
  prefix: /users/a/r/areece1/.conda/envs/my_root_gpu
  package name: conda

I haven’t been able to figure out how to get my env to recognize that conda is available as a package.

I know this is sort of a bespoke problem, but hoping someone may be able to help me out here - I’ve googled around for the errors I’m getting, but all the answers I’ve found seem to rely on the ability to exercise sudo (or they involve doing Linux-y things beyond my current level of understanding).

I’m on RHEL 7.5, conda 4.3.30, Python 3.6.6. Thanks in advance!

deke · November 5, 2018, 7:40am

Hi @agr I had similar issues when trying to setup the fastai library on a scientific computing cluster. After reading this post I ended up contacting a sys-admin to update the conda package manager in the the module I was loading to run fastai. Using conda 4.5.11 solved those issues for me.

Also, because I have little room in my home folder, I’ve tried modifying my Config() file to download data and models elsewhere than my home directory. I now get errors in both the untar_data() and create_cnn() functions in the first notebook, where they just won’t download the data or pretrained model and yet return objects (path, learner) which are not usable. Does anybody have experience with this?

tcapelle · November 5, 2018, 2:12pm

Why is pytorch-nightly so slow to install with conda?

mpho · November 6, 2018, 9:57am

Hi All

I need your help, I have a Docker image that I set up - with fastai installed via pip

Note: Everything worked will until this morning when I tried to rebuild the docker image

When I run my notebook, and try to execute lesson 1 it complains that python dependencies cv2, seaborn and bcolz are not installed.

When I check setup.py, it doesn’t look like opencv and bcolz are defined as requirements.

Please help!

cwerner · November 6, 2018, 10:04pm

Apologies if this has been answered (could not find it).

Has anyone deployed a fastai model on a server (with dokku)? I need to build a requirements.txt file (no conda), but I don’t know how I can specify this in the file?

on the command line I can use on my Mac:

pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

and this works (gives me a cpu-only install)

My current requirements.txt file:

  fastai
  Flask
  matplotlib
  numpy
  pandas
  Pillow
  plotly
  torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

Wit this file I guess deployment errors:

remote: + [[ Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: ) == --* ]]
remote: + [[ Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: ) == ==* ]]
      ' 'Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: )'
       Could not find a version that satisfies the requirement torch_nightly (from -r /tmp/build/requirements.txt (line 1)) (from versions: )
remote: + read -r line
remote: + [[ No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1)) == --* ]]
remote: + [[ No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1)) == ==* ]]
      ' 'No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1))'
       No matching distribution found for torch_nightly (from -r /tmp/build/requirements.txt (line 1))

mpho · November 7, 2018, 8:54am

Fixed it…

cwerner · November 7, 2018, 9:43am

To answer my own question:

Seems like the -f --find-links option does not work for the nightlies (or I’m using it wrong)…

This worked in my requirements.txt file:

numpy
torchvision_nightly
https://download.pytorch.org/whl/nightly/cpu/torch_nightly-1.0.0.dev20181105-cp37-cp37m-linux_x86_64.whl ; sys_plat    form == "linux"
fastai

fredguth · November 8, 2018, 4:21pm

Is torch 1.0.0.dev20181005 the right version? I can’t run course-v3 lesson2-sgd without changing the syntax.

youcefjd · November 8, 2018, 8:30pm

Hi,

I tried in every possible way to install adequately the fastai library on my OSx local machine fortuitously. When I try to import it on the Jupyter Notebook, the fastai.imports library works just fine but not the fastai.structured library. I keep getting this annoying error: ModuleNotFoundError: No module named ‘fastai.structured’

I tried the pip command, I tried following the installation instructions provided by the Github page of fastai but nothing worked out so far.

Thank you for your help!

korjusk · November 9, 2018, 6:49am

Hello!

torch cuda is Not available

My machine: Paperspace (Fastai) Ubuntu 16.04 with P4000 GPU
I tried to insall with Conda:

conda create --name baseclone --clone base
source activate baseclone
conda update -n baseclone conda
conda update --all
conda install -c pytorch pytorch-nightly cuda92
conda install -c fastai torchvision-nightly
conda install -c fastai fastai

python -c ‘import fastai; fastai.show_install(1)’

=== Software === 
python version : 3.6.6
fastai version : 1.0.21
torch version  : 1.0.0.dev20181108
torch cuda ver : 9.2.148
torch cuda is  : **Not available** 

=== Hardware === 
No GPUs available 

=== Environment === 
platform       : Linux-4.4.0-128-generic-x86_64-with-debian-stretch-sid
distro         : #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018
conda env      : baseclone
python         : /home/paperspace/anaconda3/envs/baseclone/bin/python
sys.path       : 
/home/paperspace/anaconda3/envs/baseclone/lib/python36.zip
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6/lib-dynload
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6/site-packages
/home/paperspace/anaconda3/envs/baseclone/lib/python3.6/site-packages/IPython/extensions
no supported gpus found on this system

apt list | grep nvidia-3

nvidia-304/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-304-dev/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-304-updates/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-304-updates-dev/xenial 304.137-0ubuntu0~gpu16.04.1 amd64
nvidia-331/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-updates/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-updates-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-updates-uvm/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-331-uvm/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-updates/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-updates-dev/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-340-updates-uvm/xenial 340.96-0ubuntu2 amd64
nvidia-340-uvm/xenial 340.107-0ubuntu0~gpu16.04.1 amd64
nvidia-346/xenial 352.63-0ubuntu3 amd64
nvidia-346-dev/xenial 352.63-0ubuntu3 amd64
nvidia-346-updates/xenial 352.63-0ubuntu3 amd64
nvidia-346-updates-dev/xenial 352.63-0ubuntu3 amd64
nvidia-352/xenial 361.42-0ubuntu2 i386
nvidia-352-dev/xenial 361.42-0ubuntu2 i386
nvidia-352-updates/xenial 361.42-0ubuntu2 i386
nvidia-352-updates-dev/xenial 361.42-0ubuntu2 i386
nvidia-361/xenial-updates,xenial-security 367.57-0ubuntu0.16.04.1 amd64
nvidia-361-dev/xenial-updates,xenial-security 367.57-0ubuntu0.16.04.1 amd64
nvidia-361-updates/xenial 361.42-0ubuntu2 i386
nvidia-361-updates-dev/xenial 361.42-0ubuntu2 i386
nvidia-367/xenial-updates,xenial-security 375.66-0ubuntu0.16.04.1 i386
nvidia-367-dev/xenial-updates,xenial-security 375.66-0ubuntu0.16.04.1 i386
nvidia-375/xenial-updates,xenial-security 384.130-0ubuntu0.16.04.1 i386
nvidia-375-dev/xenial-updates,xenial-security 384.130-0ubuntu0.16.04.1 i386
nvidia-375-diagnostic/unknown 375.88-0ubuntu1 amd64
nvidia-384/xenial-updates,xenial-security 384.130-0ubuntu0.16.04.1 i386
nvidia-384-dev/unknown 384.145-0ubuntu1 amd64
nvidia-384-diagnostic/unknown 384.145-0ubuntu1 amd64
nvidia-387/xenial 390.67-0ubuntu0~gpu16.04.1 amd64
nvidia-387-dev/xenial 390.67-0ubuntu0~gpu16.04.1 amd64
nvidia-390/xenial,now 390.67-0ubuntu0~gpu16.04.1 amd64 [residual-config]
nvidia-390-dev/xenial 390.67-0ubuntu0~gpu16.04.1 amd64
nvidia-390-diagnostic/unknown 390.30-0ubuntu1 amd64
nvidia-396/unknown 396.26-0ubuntu1 amd64
nvidia-396-dev/unknown 396.26-0ubuntu1 amd64
nvidia-396-diagnostic/unknown 396.26-0ubuntu1 amd64

I also tried to install fastai with PyPI:

pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
pip install fastai

And I tried to remove nvidia-387:

sudo apt-get —purge remove nvidia-387
sudo apt-get -f install
sudo reboot now

but nothing works. GPU is not recognized by torch.
How can I fix it?

Thank you!

stas · November 10, 2018, 3:59am

See: https://docs.fast.ai/troubleshoot.html#correctly-configured-nvidia-drivers

One thing for sure is wrong that you’re installing nvidia-387 and cuda92. You need cuda90 for 384 < nvidia < 396 - see the table at the end of the linked section above.

korjusk · November 12, 2018, 8:27am

I thought that I had the latest nvidia drivers installed but I was wrong.

After installing:

sudo apt purge nvidia-*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-396
sudo reboot

everything worked. Thank you!

ritika26 · November 17, 2018, 4:07am

Hi,
I am also facing the similar issue. Can you please suggest how to fix this issue.

PicklingError: Can’t pickle <function crop_pad at 0x000001F39C8A9A60>: it’s not the same object as fastai.vision.transform.crop_pad

Code:

data = (ImageItemList.from_folder(path/‘train’)
.random_split_by_pct(0.2)
.label_from_folder()
.transform(tfms, size=224,bs=32,padding_mode=‘zeros’,num_workers=0)
.databunch())

I am using fastai 1.0.24 ,windows 10

Thanks,
Ritika