Jeremy's Harebrained install guide

Continuing the discussion from Jeremy's Harebrained install guide:

Hey jeremy, i just took the liberty and changed that ssh URL for fastai_docs to https method. Hope you won’t mind. :slight_smile:

1 Like

I also got the same problem with lldb package.

1 Like

I haven’t encountered this personally, but I would double check that $PATH is pointing to the tool chain. I’d also check whether the --swift-toolchain argument is pointing to the correct tool chain path when running register.py for swift-jupyter. Are you able to start a fresh notebook with a Swift kernel? If so, what happens when you import TensorFlow? I would also try to test the repl from the command line, i.e. what happens when you run swift and attempt to import TensorFlow from the repl? That could help narrow it down.

I was able to get an ARM64 build of Swift for TensorFlow to work on the Jetson Nano via the 3/9/2019 snapshot. Unfortunately, this is with the API as it was around the 0.2 release, and their newer snapshots there appear to be packaged incorrectly, with libtensorflow missing from them.

Using that, I was able to build, run, and train a model I’d set up under the 0.2 API on the Jetson Nano. Unfortunately, it looks like this toolchain was built without GPU support, because despite having CUDA 10 and an appropriate cuDNN version on the board it looked to be running entirely on the CPU.

I have a Jetson Xavier here that I’m going to see if I can add storage to and use as a build machine to create current ARM64 GPU-enabled toolchains for the Jetson series of devices. For years, I’d checked in on the status of aarch64 Swift compiler builds, but there was always something blocking them from working. Really glad to see that the last of the problems have been solved, and this is now operational. Having a CUDA- and cuDNN- compatible computer for $99 opens up some neat possibilities.

3 Likes

I spent a solid day working on every variant I could think of and could get SWTF working, but never with Jupyter notebook. That’s why I asked the SWTF team (that does monitor this post), to weight in. So far crickets.

had issues installing too and switched to the colab route, the error you see here seems fixed (as s4tf team announced): S4TF in colab (error)

wget and tar commands for the update script above for 0.3.1
wget https://storage.googleapis.com/swift-tensorflow-artifacts/releases/v0.3.1/rc1/swift-tensorflow-RELEASE-0.3.1-cuda10.0-cudnn7-ubuntu18.04.tar.gz

tar xf swift-tensorflow-RELEASE-0.3.1-cuda10.0-cudnn7-ubuntu18.04.tar.gz

3 Likes

Has anyone answer to this question? I have been trying to solve this for a week without success. Probably I should delete the old version but I’m not sure how to do it and is I don’t want take risk of being without Cuda for some time.

@Lankinen did fiddle with the symbolic links, following these instructions:

That all seemed working fine - so be brave and go for it :wink: Little disclainer: I didnt get the entire install to work but am hanging elsewhere

1 Like

With echo $PATH I get the following output:
/home/user/swift/usr/bin:/home/user/swift/usr/bin:/home/user/swift/usr/bin:/home/user/anaconda3/bin:/home/user/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

I am not sure if I understand you here correctly, maybe you can clarify on how to check --swift-toolchain location?

If I open a fresh swift notebook the execution of the cell is not carried out
image
and I get the same ModuleNotFoundError: No module named '_lldb' as mentioned above with the following message after some time:
Error opening stream: HTTP 404: Not Found (Kernel does not exist

I have multiple conda environments, e.g., fastai+pytorch, tf, etc. - Can this cause this problem?
Do I need to activate a specific environment while going through the install steps, i.e., the tensorflow environment? (I tried this but this also does not seem to solve it.)

I am looking forward to solve this problem to get S4TF running on my machine and start playing with it. :slight_smile:

Tried building everything from scratch for the NVIDIA Jetson Nano. The first thing you have to do according to the Building Swift for Tensorflow instructions is to build Bazel. I was able to successfully build it using these instructions (it took several hours to compile, but it worked).

Now I Tried to compile SWTF and for the most part everything compiled, but when it got to the Bazel part of the build I got these errors:

Starting local Bazel server and connecting to it...
INFO: Analysed 2 targets (152 packages loaded, 15102 targets configured).
INFO: Found 2 targets...
ERROR: /home/foo/.cache/bazel/_bazel_bart/7492b370f194e9e9c86b17ac20a297fb/external/mkl_dnn/BUILD.bazel:101:1: C++ compilation of rule '@mkl_dnn//:mkldnn_single_threaded' failed (Exit 1)
In file included from external/mkl_dnn/src/cpu/rnn/../cpu_isa_traits.hpp:35:0,
                 from external/mkl_dnn/src/cpu/rnn/ref_rnn.hpp:27,
                 from external/mkl_dnn/src/cpu/rnn/cell_common.cpp:20:
external/mkl_dnn/src/cpu/rnn/../xbyak/xbyak_util.h:84:12: fatal error: cpuid.h: No such file or directory
   #include <cpuid.h>
            ^~~~~~~~~
compilation terminated.
INFO: Elapsed time: 61.231s, Critical Path: 13.46s
INFO: 20 processes: 20 local.
FAILED: Build did NOT complete successfully
utils/build-script: fatal error: command terminated with a non-zero exit status 1, aborting

Does anyone know how to fix this, so I can complete the build?

So your home folder is called user? I was just wondering if you could verify that the path of the last argument to the register.py script was pointing to the correct place on disk. Based on what you posted, your toolchain should be located at ~/swift/. For what it’s worth my S4TF env is still using Python 3.6. I think 3.7 works now, but I’m not 100% certain.

Yes, the directory structure in ~/swift/usr looks like this:
bin include lib libexec local share

I was looking more into the swift conda env:

  • under /home/mmp/anaconda3/envs/ I do not have a swift directory, is this ok?
  • also the ~/.conda/environments.txt does not show a swift environment (as does conda env list).

This looks like the jupyter kernel is going nowhere and maybe manually setting the swift conda env dir could solve the problem? However, I am not sure which directory in ~/swift/ or .../swift-jupyter/ is the env directory?

This is the full bash of the error message from jupyter:

[I 07:07:53.540 NotebookApp] KernelRestarter: restarting kernel (4/5), new random ports
Traceback (most recent call last):
  File "/home/user/swift/usr/lib/python3.7/site-packages/lldb/__init__.py", line 35, in <module>
    from . import _lldb
ImportError: libcusparse.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/swift-jupyter/swift_kernel.py", line 19, in <module>
    import lldb
  File "/home/user/swift/usr/lib/python3.7/site-packages/lldb/__init__.py", line 39, in <module>
    import _lldb
ModuleNotFoundError: No module named '_lldb'
[W 07:07:56.548 NotebookApp] KernelRestarter: restart failed
[W 07:07:56.548 NotebookApp] Kernel 2a91dff9-f709-4e67-bbb2-47ece03a1d0a died, removing from map.
[W 07:08:41.579 NotebookApp] Timeout waiting for kernel_info reply from 2a91dff9-f709-4e67-bbb2-47ece03a1d0a
[E 07:08:41.583 NotebookApp] Error opening stream: HTTP 404: Not Found (Kernel does not exist: 2a91dff9-f709-4e67-bbb2-47ece03a1d0a)

All my attempts failed because i couldn’t update GLIB from 2.23 to 2.27 (using deep learning AMI in AWS). Will try with a fresh ubuntu 18.04. Swift notebooks worked except for nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.. Shutting then turning on the instance solved the issue.

Sorry if you already mentioned this, but did you try running swift from your bash prompt and then import TensorFlow from the REPL? You might try it first just referencing swift directly then if that doesn’t work, actually giving the path to swift in the bin folder of your toolchain (e.g. ~/swift/usr/bin/swift or where ever it is) . If you haven’t set up a fresh conda env, I’d recommend trying that. Make sure you are in the correct conda enviroment when running python register.py --sys-prefix --swift-python-use-conda --use-conda-shared-libs --swift-toolchain ~/swift as well.

After creating a new conda swift env and running python -m ipykernel install --user --name swift --display-name "swift" I see the kernel in jupyter but when I run it behaves as a normal python notebook.

If I run swift in the bash I get a similar error:
swift: error while loading shared libraries: libcusparse.so.10.0: cannot open shared object file: No such file or directory

and with ~/swift/usr/bin/swift:

/home/user/swift/usr/bin/swift: error while loading shared libraries: libcusparse.so.10.0: cannot open shared object file: No such file or directory

I found this related thread and post: https://github.com/tensorflow/tensorflow/issues/26182#issuecomment-472674508
But this did not worked for me.

However, if I search for the missing library I find it in other conda env but not in the swift env:

/home/user/anaconda3/pkgs/cudatoolkit-10.0.130-0/lib/libcublas.so.10.0
/home/user/anaconda3/envs/fastai-pytorch-nightly/lib/libcublas.so.10.0
/home/user/anaconda3/envs/fastai/lib/libcublas.so.10.0
/home/user/anaconda3/envs/fastai-dev/lib/libcublas.so.10.0

Nvidia-smi shows CUDA 10.1 and in my swift env I have the following cudnn, cudatoolkit and tensorflow packages installed:

Name                    Version                   Build  Channel
cudnn                     7.3.1                 cuda9.2_0
cudatoolkit               9.2                           0
tensorflow                1.12.0          gpu_py36he74679b_0
tensorflow-base           1.12.0          gpu_py36had579c0_0
tensorflow-gpu            1.12.0               h0d30ee6_0

Maybe it is CUDA 10.1? Maybe others with the same error can confirm this?

You may consider posting to the Google Group: https://groups.google.com/a/tensorflow.org/forum/#!forum/swift

They’re pretty responsive.

working inside jupyter nb works fine for me but if i try to run swift at the command line to open the repl I still get this issue still with the latest version:

anyone have any insight on this? just want to make sure i don’t have something misconfigured

I’m not sure S4TF works with 10.1 yet. I could be wrong, but last I checked it required CUDA 10.0 and CUDNN 7.5. Make sure you run the ldconfig part of Jeremy’s install guide. I’d also sudo vim /etc/ld.so.conf to make sure there’s not an older version (or 10.1) referenced there. Here’s what mine looks like:

include /etc/ld.so.conf.d/*.conf
/usr/local/cuda-10.0/lib64

And here’s what my path looks like:

(base) j@j:~$ echo $PATH
/usr/local/cuda/bin:/home/j/dev/swift/stf-builds/development/usr/bin:/home/j/anaconda3/bin:/home/j/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin
2 Likes

Did you ever find a solution to this?