Fastai v1 install issues thread

stas · October 10, 2018, 1:29am

This is a pytorch installation issue, @gsg.

While some people reported that just rebooting your computer is needed (i.e. your CUDA install is not activated yet), chances are that you haven’t installed the cuda version that’s matching your CUDA drivers. conda will happily install any cudaXX - it can’t tell what your CUDA system is - it doesn’t mean your system will support it.

Check which apt packages you have installed for CUDA (or whether you built it from source), perhaps they aren’t 9.2 (and see below)? Otherwise please research on google, perhaps with specifics of your distro? there are many many threads about it. e.g. this one

And this cuddn thing I think is just plain misleading, as it seems to be available even on systems w/o GPU. I’m considering removing the cudnn part of the report.

update:

I will change the script to add the output of:

nvcc --version

which tells you the actual CUDA version you have. I thought I could cheat and use torch.version.cuda but as you can tell it just tells you which cudaXX you installed. I guess we need to report both.

update2: I reworked the reporting tool to add nvcc info

please try:

git pull
python -c 'import fastai; fastai.show_install(1)'

stas · October 10, 2018, 5:03am

update: this post might still be of interest to someone, but we have just updated README to indicate that your system installed cuda version no longer matters, i.e. just install pytorch-nightly with cuda92 at the moment no matter what (as long as you have NVIDIA GPU that is)

Let me show the correct and the incorrect pytorch cuda build version installation and how you can tell which one is correct.

So currently on my system with correctly installed cuda92, I get:

$ python -c 'import fastai; fastai.show_install(0)'
[...]
torch cuda  : 9.2.148
nvcc  cuda  : 9.2.148
[...]

You can see that the installed 'torch` package was built against the correct cuda.

Now observe this:

I uninstalled all critical conda/pip fastai-v1 related packages:

conda uninstall -y fastai pytorch-nightly cuda92 torchvision-nightly
pip uninstall -y fastai torch-nightly torchvision-nightly

and on purpose installed the wrong cuda90 (when my system’s cuda is cuda92):

conda install -y -c pytorch pytorch-nightly cuda90
conda install -y -c fastai torchvision-nightly
conda install -y -c fastai fastai

notice that conda had no complaints!

eh voila:

$ python -c 'import fastai; fastai.show_install(0)'
[...]
torch cuda  : 9.0.176
nvcc  cuda  : 9.2.148
[...]

now we know what the problem is. You can see that the installed 'torch` package was built against the wrong cuda build.

You can also tell which pytorch conda build you have via:

$ conda list pytorch-nightly
pytorch-nightly           1.0.0.dev20181009 py3.6_cuda9.0.176_cudnn7.1.2_0    pytorch

so you can see it’s a cuda-9.0.176 build! This is a wrong build for my cuda-9.2 system.

I couldn’t find how to get the same level of information from pip, other than a hackish:

$ pip show -f torch_nightly | grep cudart
  torch/lib/libcudart-72ec04ea.so.9.2

so you can tell from the library version that I have pytorch w/ cuda 9.2 installed with pip.

When you try to uninstall the wrong packages, please note, that with conda it’s not enough to:

conda uninstall -y cuda92

You must do:

conda uninstall -y pytorch-nightly cuda92

In the case of pip it’s enough to do:

pip uninstall -y torch-nightly

and then re-installing the correct version.

and yes, it’s very confusing that pip and conda pytorch packages have different names.

p.s. I used -y a lot in the examples, which just tells the programs not to wait for confirmation to install/uninstall things and to just proceed without asking for confirmation. Feel free to remove the -y when you do your own experiments.

tschoy · October 10, 2018, 9:46am

I conda created a new env on my Mac. Then followed the github instruction to conda install pytorch, etc. My jupyter notebook doesn’t see the new env until I check with

jupyter kernelspec list

python3 -m ipykernel install --name new-env

Hope this help for any one with similar problem.

gsg · October 10, 2018, 1:11pm

$ python -c ‘import fastai; fastai.show_install(1)’

platform   : Linux-4.15.0-36-generic-x86_64-with-debian-stretch-sid
distro     : Ubuntu 16.04 Xenial Xerus
python     : 3.6.5
fastai     : 1.0.6.dev0
torch      : 1.0.0.dev20181008
torch cuda : Not available
torch cuda : 9.2.148
nvcc  cuda : 9.2.148
torch gpus 
no supported gpus found on this system

Also, Notice that:

$conda list pytorch-nightly
# packages in environment at /home/german/anaconda3/envs/fastai:
#
# Name                    Version                   Build  Channel
pytorch-nightly           1.0.0.dev20181009 py3.6_cuda9.2.148_cudnn7.1.4_0  [cuda92]  pytorch

So the pytorch nightly was built agains 9.2.148, same as I have installed…
Maybe they had a problem with it and went back to 9.0?

gsg · October 10, 2018, 2:29pm

Problem solved… As @stas indicated it was a pytorch - cuda issue.
I removed older installations of cuda, then apt-get to get 9.2 in place…
And now pytorch sees the GPU…
Thanks!

Sayak · October 10, 2018, 2:38pm

I tried v1 on FloydHub’s platform and it was very seamless. I tried the CPU variant only as at this point of time, I don’t have any GPUs on my FloydHub account. I tried the vision API and it was so much fun. If anyone wants help in setting up v1 on a FloydHub work-space please feel free to ask me.

stas · October 10, 2018, 4:58pm

Happy to hear you got it resolved, @gsg .

I’m still unclear about why you did have matching cuda versions in torch and the system, and it wasn’t ‘cuda available’ still.

Did you by chance record the exact commands you run to fix this?

In particular what apt packages got removed and what replaced?

gsg · October 10, 2018, 5:42pm

@stas
I have the following on my shell history…

sudo apt-get install linux-headers-$(uname -r)
conda install -y -c pytorch pytorch-nightly cuda92
sudo /usr/bin/nvidia-uninstall
sudo apt-get --purge remove nvidia-387
sudo apt-get -f install
sudo reboot now

stas · October 10, 2018, 5:52pm

This is helpful. You dumped the old nvidia-387 driver.

So which nvidia driver do you have installed now? Which is probably the key to your solution.

gsg · October 10, 2018, 6:06pm

Yes, there were some comments on the CUDA forums about incompatibilities arising from multiple installs…
Currently I have:
NVIDIA-SMI 410.48 Driver Version: 410.48

stas · October 10, 2018, 11:29pm

And please see the updated README you can now always install cuda92 no matter what you have on your system (As long as the NVIDIA driver is running).

stas · October 11, 2018, 4:37am

@tschoy, thank you for sharing your tip.

For the future please note that it is always helpful when the poster shares at least the specific version of your OS, as a lot of them have their own quirks. And then readers know whether it applies to them or not.

Also what was the output of jupyter kernelspec list - nothing I suppose?

Trying to reproduce your problem I have added a jupyter CLI test to the mac CI build which runs macOS-10.13.

After installing fastai, the test was:

jupyter nbconvert --execute --ExecutePreprocessor.timeout=600 --to notebook examples/tabular.ipynb

and it worked just fine. I didn’t need to install ipykernel.

I also added a test:

jupyter kernelspec list | grep python3

i.e it has a python3 kernel from the get going.

Seems to work just fine on mac conda/pip build.

I was thinking next to try to emulate a jupyter notebook run, but it seems that what I did above is identical to jupyter notebook run.

So I’m not sure what else to try to reproduce it.

Could you possibly create a new environment, up to the point where you needed to do what you described, but not do it, and just list the kernelspec as the command above?

and then paste the whole sequence of commands you did from the beginning to the end. No need for outputs, just the bash history. Plus the output of jupyter kernelspec list but before you needed to:

python3 -m ipykernel install --name new-env

and finally what was the error that you received that lead you installing a new kernelspec, and what operation were you doing when the error occurred.

I’m trying to see whether it was empty, or something else is wrong.

Thanks.

tschoy · October 11, 2018, 10:21am

@stas Thanks for reminding. Sorry for the confusion, I didn’t make it clear, it’s not a mac specific issue, but a conda issue. Installing on single or existing environments should not have this issue.

I was only referring to this issue:

Conda stopped automatically registering NEW environment for jupyter. This is only an issue for people who wants to have multiple environments, e.g. want to keep different versions working during major transitions. (I’m using conda 4.5.11)

dhoa · October 11, 2018, 12:23pm

A dump question but how do we update the fast.ai library for developper version ? When I tried to git pull this error appeared:

Updating dc60a37..d56dfb8
error: Your local changes to the following files would be overwritten by merge:
examples/cifar.ipynb
examples/text.ipynb
Please, commit your changes or stash them before you can merge.
Aborting

I tried git reset --hard but it doesn’t solved the problem

Thank you so much in advanc

jeremy · October 11, 2018, 12:49pm

Just do what it suggests - git commit or git stash. Lots more info via google about that error.

stas · October 11, 2018, 3:47pm

dhoa:

Updating dc60a37..d56dfb8
error: Your local changes to the following files would be overwritten by merge:
examples/cifar.ipynb
examples/text.ipynb
Please, commit your changes or stash them before you can merge.
Aborting

It has to do with the nbstripout filter. (tools/fast-nbstripout).

It’s normally not a problem unless a developer w/ commit rights forgets to run:

 tools/trust-origin-git-config

which enables the filters. And instead commits an unstripped notebook.

When that happens it messes up everybody’s check outs.

To recover from that you need to disable the filter, clean up your checkout and re-enable the filter.

tools/trust-origin-git-config -d
git stash
git pull
tools/trust-origin-git-config

You can do git checkout examples in this case, or reset - but you have to do it after you disabled the filters, and then remember to re-enable them back.

The instructions include this setup for all. Unfortunately, there is no way to enforce this instrumentation, due to git’s security, other than perhaps installing a server-side git hook which will review each commit and refuse pushes that have unstripped notebooks in them.

The stripout filter allows us to collaborate on the notebooks w/o having conflicts with different execution counts, locally installed extensions, etc., keeping under git only the essentials. Ideally, even the outputs should be stripped, but that’s a problem if one uses the notebooks for demo, as it is the case with example nbs.

jeremy · October 11, 2018, 4:25pm

Might be worth putting that in a little script?

stas · October 11, 2018, 4:59pm

This is a good idea, except we can’t make an assumption of the state of user’s checkout. Any such automated scripts will wipe it clean (git stash pop may work, or may have merge conflicts). So it’d be a dangerous script to have since some user modifications to the notebooks (and other code) could be lost.

I think if a complete reset is wanted, and perhaps to save the state of the old checkout, a simple git clone would be a much safer approach. Then the user knows exactly what they are doing. i.e “I’m discarding this checkout and getting a fresh one”.

stas · October 11, 2018, 6:41pm

Yes, I have seen that SO thread. But I’m still unable to reproduce your issue. I have conda 4.5.11 as you do and I have a whole bunch of environments.

I just did:

conda create -y  python=3.6 --name fastai-py3.6
conda activate fastai-py3.6
conda install -y conda
conda install -y pip setuptools
conda install -y -c pytorch pytorch-nightly cuda92
conda install -y -c fastai torchvision-nightly
conda install -y -c fastai fastai
conda uninstall -y fastai
pip install -e .[dev]

and then:

jupyter kernelspec list
Available kernels:
  python3    /home/stas/anaconda3/envs/fastai-py3.6/share/jupyter/kernels/python3

How is your setup different than mine? That’s why I was asking you for the exact sequence of commands you used.

dhoa · October 11, 2018, 7:29pm

Thanks for your information @stas. I was thinking that maybe the problem came from /tools. But because I don’t have many experience with git yet so I let my self ask it here :D. I will read carefully your comment.