Unable to run fast.ai on self compiled pytorch under ubuntu :(

Hello everyone,

thanks not only for developing and sharing fast.ai as a library, but also for the wonderful courses and the community around this whole topic!

Unfortunately I am currently not able to use the library, due to the fact of a non functioning precompiled pytorch (supplied by your install script for your paperspace solution).
I am using an ‘old’ AMD cpu, which does not offer the required instruction set demanded by the precompiled pytorch package. Thus I am forced to compile it on my own…

I am running an ubuntu node locally. I do not mind the release version. I am free to swap as I like. Currently I am running 17.10, but also tested 16.04. I am willing to switch to any distribution (headless release), as long as I can manage to run your code.

I installed http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb and http://files.fast.ai/files/cudnn-9.1-linux-x64-v7.tgz (see http://files.fast.ai/setup/paperspace).

My python (3.6.3) packages are the following:
bcolz 1.2.0
bleach 2.1.3
certifi 2017.4.17
cycler 0.10.0
decorator 4.2.1
entrypoints 0.2.3.post1
html5lib 1.0.1
ipykernel 4.6.1
ipython 6.2.1
ipython-genutils 0.2.0
ipywidgets 7.1.2
isoweek 1.3.3
jedi 0.11.1
Jinja2 2.9.6
jsonschema 2.6.0
jupyter 1.0.0
jupyter-client 5.1.0
jupyter-console 5.2.0
jupyter-core 4.3.0
MarkupSafe 1.0
matplotlib 2.2.2
mistune 0.7.4
nbconvert 5.3.1
nbformat 4.4.0
notebook 5.4.1
numpy 1.14.2
olefile 0.45.1
opencv-python 3.4.0.12
pandas 0.22.0
pandas-summary 0.0.41
pandocfilters 1.4.2
pexpect 4.2.1
pickleshare 0.7.4
Pillow 5.0.0
prompt-toolkit 1.0.15
ptyprocess 0.5.2
Pygments 2.2.0
pyparsing 2.2.0
python-dateutil 2.7.1
pytz 2018.3
PyYAML 3.12
pyzmq 16.0.2
qtconsole 4.3.1
scipy 1.0.1
seaborn 0.8.1
simplegeneric 0.8.1
six 1.11.0
terminado 0.6
testpath 0.3.1
tornado 4.5.3
tqdm 4.19.8
traitlets 4.3.2
wcwidth 0.1.7
webencodings 0.5.1
widgetsnbextension 3.1.4

I cloned pytorch v0.3.1 from https://github.com/pytorch/pytorch and compiled with an ancient gcc version (see https://medium.com/@Inoryy/compiling-pytorch-4-0-on-ubuntu-17-10-with-cuda-9-0-and-python-3-6-6769a0df56d5).

When I try to run (from lesson 1)
from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *
PATH = "data/dogscats/"
sz=224
torch.cuda.is_available()
torch.backends.cudnn.enabled
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
the last line produces a crash.

Line 20 res = torch.FloatTensor(a.astype(np.float32)) in fastai/courses/dl1/fastai/core.py generates the runtime error RuntimeError: tried to construct a tensor from a nested float sequence, but found an item of type numpy.float32 at index (0, 0, 0, 0).
If I change this line to res = torch.FloatTensor(a.astype(np.float32).tolist()) and also adjust line 18, the error will disappear and the code runs for a very long time (break after approx. 20mins). There is no progress made during this time. The code is clearing hanging somewhere… Please help me to debug this further.

Yes cuda is up and running. Both torch.cuda.is_available() and torch.backends.cudnn.enabled return True.

Which pytorch version is the minimal required one? Please help me to debug this issue in order to resolve it.

1 Like

Last time I had some issues with the latest pytorch version (master branch). But I gave it an other try and are now running on the latest torch version:
>>> torch.__version__
'0.4.0a0+60a16e5'

There is no error from fastai/core.py, but I am still having some trouble running the lession 1 source (haven’t checked any other examples/lessions).
I compiled WITH_NUMPY support. This might be the reason for the fix? torch.from_numpy() is now available (but I don’t know if this is used at all).

My 4GB of CPU RAM is fully utilized. I think this is the root of the problem… The GPU uses only 450MB out of 4GB.

The lession 1 spawned many threads (15 python3 threads in total) and was using my 4GB of RAM plus my (just added) 10GB of swap…

num_workers = 0 did the trick. It is working now!

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), num_workers=0)

I am writing this monolog in case someone might find it useful in future while stumbling upon this problem.

edit: Running the code from the jupyter notebook does not overload my system memory and works with num_workers=8.