Can't import QRNN or QRNNLayer

wgpubs · May 2, 2019, 1:57am

Running against v. 1.0.52 and running this …

from fastai.text.models.qrnn import QRNN, QRNNLayer

returns the following RuntimeError (just including snippets cuz its looooooong):

Error building extension 'forget_mult_cuda': [1/3]
...
/usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory
 #include_next <stdlib.h>
               ^~~~~~~~~~
compilation terminated.
...
FAILED: forget_mult_cuda_kernel.cuda.o 
...
In file included from /usr/include/crt/math_functions.h:8835:0,
                 from /usr/include/crt/common_functions.h:271,
                 from /usr/include/common_functions.h:50,
                 from /usr/include/cuda_runtime.h:115,
                 from <command-line>:0:
/usr/include/c++/6/cmath:45:23: fatal error: math.h: No such file or directory
 #include_next <math.h>
                       ^
compilation terminated.
ninja: build stopped: subcommand failed.

I pip installed the appropriate version of cupy (I’m on CUDA 9.1) via https://github.com/cupy/cupy.

Not sure if there is anything else I need to do. Any ideas?

Thanks

TomB · May 2, 2019, 7:32am

Looks like there is a compatibility issue with your system toolchain. What linux distro are you using? A search found some similar issues on Ubuntu 18.04. You may want to report this issue to pytorch, after searching there for similar issues (I didn’t find any in a quick search). This seems like an issue with pytorch’s JIT C compilation rather than anything specific to fastai.

I was able to import the module and have the extension compiled when using a conda install of fastai so you could try that. I created an environment with conda create -n qrnn_test -c fastai -c pytorch fastai gxx_linux-64. Make sure to install a compiler into the conda environment (as in the gxx_linux_64 package) or it will use the system one.
I got warnings about an incompatible compiler but believe this is just due to a weird compiler name under conda. The building worked though I didn’t actually test it beyond importing and checking the compiled extension was there:

>>> from fastai.text.models.qrnn import forget_mult_cuda
/home/user/.conda/envs/qrnn_test/lib/python3.7/site-packages/torch/utils/cpp_extension.py:166: UserWarning:
...
>>> forget_mult_cuda
<module 'forget_mult_cuda' (/tmp/torch_extensions/forget_mult_cuda/forget_mult_cuda.so)>

wgpubs · May 2, 2019, 6:22pm

Ubuntu 18.04

Did that: conda create -n test python=3.7

what is the gxx_linux-64 thought? That doesn’t look familiar to me … maybe I have to conda install that???

wgpubs · May 2, 2019, 7:41pm

UPDATE:

Did a conda install gxx_linux-64 and now I get this warning followed by a exception stack trace:

Your compiler (~/anaconda3/envs/fastai-course-v3/bin/x86_64-conda_cos6-linux-gnu-c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension.

Tried restarting conda environment and still get the warning and errors. Looks like its not using the compilier I just installed but not sure how to rectify.

TomB · May 3, 2019, 5:11am

That is what seemed to be causing the issue in the reports of similar issues in other packages. I didn’t see any reports for pytorch itself. As I said you might report it to pytorch. It seemed to be an issue with how certain include paths were handled on Ubuntu 18.04. Didn’t see any solutions that didn’t involve modifying pytorch but you might check you are using the most recent compiler packages (do a apt update && apt upgrade).

Yes, you needed to install gxx_linux-64, that is the GNU C/C++ compiler package (also called gcc/g++ hence the gxx). Installing that creates various environment variables that pytorch will use to find the compiler which it will use instead of your system compiler (in this case it uses the $CXX environment variable and finds the x86_64-conda_cos6-linux-gnu-c++ compiler).
Yes, I received that warning, but it is just a warning. It still seemed to have built the extension. I looked at the code and it’s just looking for g++ in the name of the compiler executable whereas anaconda have called it x86_64-conda_cos6-linux-gnu-c++. Given the way it is used through the environment variable there’s no easy way to change the name and satisfy pytorch.
I’m pretty sure the fastai.text.models.qrnn.forget_mult_cuda is the extension that is built by this process. So if that is created I’d at least try using QRNN and see if you get further errors.

Oh I would note as you said you were using CUDA9.1 that by default conda will install CUDA 10 along with pytorch. If you need CUDA9 then you should do conda install pytorch torchvision cudatoolkit=9.0 -c pytorch.as per the pytorch instructions (there is also a cudatoolkit 9.2 but no 9.1 in conda and the pytorch instructions say 9.0).

wgpubs · May 3, 2019, 7:57pm

Nope … no glory.

Get warning and then exception and stack trace.

Thanks for your input though … not sure what to try next but will keep folks posted if I find a resolution.

much_learner · January 23, 2020, 8:03pm

I have the same error in Colab, have you solved it?

wgpubs · January 23, 2020, 8:06pm

Yup. See my post here:

much_learner · January 28, 2020, 6:19pm

In my case pip install ninja solved the issue, both on kaggle and colab.

nvcc and fastai reported the same cuda version.