Since Tensorflow-GPU requires CUDA 9.0 and not 9.1, is there any way to run it concurrently with Fast.AI (on Paperspace)?

xjdeng · March 14, 2018, 3:17am

I’ve tried running Tensorflow-gpu on my Paperspace instance where I installed Fast.AI but got error messages each time. Can’t remember what they were but I later found out that it’s because I’m running CUDA 9.1 but the latest Tensorflow (1.6) still will only run on CUDA 9.0.

Is there any way I can get both Fast.AI AND Tensorflow to run on the same Paperspace instance, perhaps by installing both CUDA 9.1 and 9.0?

jeff · March 16, 2018, 1:03am

You can compile Tensorflow from source. That’s what I’ve been doing on my personal desktop machines, which currently have CUDA 9.1.

Combalgorythm · March 16, 2018, 1:26am

How did you install tensorflow-gpu on paperspace? conda install? compiled from source? pip install? I guess the fastai venv/paperspace machine is using conda python. You can do a quick check which python env your system is using. If its conda python, then I guess installing tensorflow-gpu from conda would solve the problem. Again, it would be much more helpful if you can provide screenshots/exact errors.

xjdeng · March 17, 2018, 6:09pm

I highly doubted installing from conda instead of pip would make any difference but I tried anyways:

pip uninstall tensorflow-gpu
conda install -c aaronzs tensorflow-gpu

Note only aaronzs has a version of tensorflow-gpu over 1.5 on the Anaconda Cloud since any version under 1.4 can’t be run on Cuda 9.

And sure enough, this version still gives me the same error.

Combalgorythm · March 17, 2018, 6:15pm

Can you post a screenshot of the error that you are getting?

Combalgorythm · March 17, 2018, 6:23pm

And seems like people are facing lot of problems with cuda 9.1. There are even recent issues opened on github. Please go through the following:

https://github.com/tensorflow/tensorflow/issues/15656 [second last comment of gunan]
https://github.com/tensorflow/tensorflow/issues/15140

May be you can try building from source as jeff suggested above.

xjdeng · March 18, 2018, 2:27pm

Yes, it worked now!

I followed this guide.

But jumped straight to the Bazel section (since most of the libraries were already preinstalled.)

After finishing that section, jump back down to the “Tensorflow Install” section and run those

Make sure when you go to /tmp/tensorflow_pkg you install the right .whl file, not the one in the example.

If anyone wants the .whl, I can probably post it to Github or Dropbox but I can’t guarantee it’ll work on yours though the chances are better if you’re also using Paperspace.

dkobran · April 9, 2018, 7:24pm

Just a quick update: There’s a new version of our ML in a Box template that includes CUDA 9. It’s based on a slightly newer version of TensorFlow (1.7) but hopefully that shouldn’t be a problem. We won’t be updating the Fast.ai template yet as Jeremy manages it but you can use his script to pull down the Fast.ai requirements into the ML in a Box template.

Hope that helps.

quaxton · August 16, 2018, 4:22am

I also followed the guide you linked in the description.

However, when I run the keras_lesson1.ipynb I get the following error:

---------------------------------------------------------------------
AttributeError                      Traceback (most recent call last)
<ipython-input-21-617ef3978760> in <module>()
      1 base_model = ResNet50(weights='imagenet',include_top=False)
      2 x = base_model.output
----> 3 x = GlobalAveragePooling2D()(x)
      4 x = Dense(1024, activation='relu')(x)
      5 predictions = Dense(1, activation='sigmoid')(x)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    441 
    442             # Handle mask propagation.
--> 443             previous_mask = _collect_previous_mask(inputs)
    444             user_kwargs = copy.copy(kwargs)
    445             if not is_all_none(previous_mask):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/keras/engine/base_layer.py in _collect_previous_mask(input_tensors)
   1309             inbound_layer, node_index, tensor_index = x._keras_history
   1310             node = inbound_layer._inbound_nodes[node_index]
-> 1311             mask = node.output_masks[tensor_index]
   1312             masks.append(mask)
   1313         else:

AttributeError: 'Node' object has no attribute 'output_masks'

From this:

base_model = ResNet50(weights='imagenet',include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

This didn’t happen when I installed Tensorflow with CPU support only. Any suggestions?