Crestle [easier alternative to AWS]


(Anurag Goel) #125

I just noticed Crestle users have created well over 10,000 notebooks as of this morning. Thank you to everyone who tried it out and provided feedback. Hopefully it will continue to be helpful!


#126

My code was running correctly on Crestle before.

Buy recently I have the same problem, and i have changed Theano version to 0.9.0.

This error has gone. But I get another error when I fit the model.

1 #include <Python.h>
2 #include <iostream>
3 #include "theano_mod_helper.h"
4 #include "cuda_ndarray.cuh"
5 #include <math.h>
6 #include <numpy/arrayobject.h>
7 #include <numpy/arrayscalars.h>
8 #include "cudnn.h"
9 #include "cudnn_helper.h"
10 //////////////////////
11 ////  Support Code
12 //////////////////////
13 
14 void _capsule_destructor(PyObject *o) {
15     void *d = PyCapsule_GetContext(o);
16     void *p = PyCapsule_GetPointer(o, NULL);
17     void (*f)(void *) = (void (*)(void *))d;
18     if (f != NULL) f(p);
19 }
20 
21 
22 static cudnnHandle_t _handle = NULL;
23 

// the code …
===============================
mod.cu(77): error: identifier "cudnnSetFilterNdDescriptor_v4" is undefined
mod.cu(326): warning: conversion from a string literal to "char *" is deprecated
mod.cu(329): warning: conversion from a string literal to "char *" is deprecated
mod.cu(332): warning: conversion from a string literal to "char *" is deprecated
mod.cu(335): warning: conversion from a string literal to "char *" is deprecated
mod.cu(338): warning: conversion from a string literal to "char *" is deprecated
mod.cu(341): warning: conversion from a string literal to "char *" is deprecated
mod.cu(345): warning: conversion from a string literal to "char *" is deprecated
1 error detected in the compilation of "/tmp/tmpxft_00000113_00000000-9_mod.cpp1.ii".

['nvcc', '-shared', '-O3', '-Xlinker', '-rpath,/usr/local/cuda/lib64', '-arch=sm_37', '-m64', '-Xcompiler', '-fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-I/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-I/usr/local/cuda/include', '-I/home/nbuser/.local/lib/python2.7/site-packages/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/dist-packages/numpy/core/include', '-I/usr/include/python2.7', '-I/home/nbuser/.local/lib/python2.7/site-packages/theano/gof', '-L/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-L/usr/lib', '-o', '/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmplwhdaa/ea4e203b6529466794536f8a1bfa77ae.so', 'mod.cu', '-lcudart', '-lcublas', '-lcuda_ndarray', '-lcudnn', '-lpython2.7']

ExceptionTraceback (most recent call last)
<ipython-input-13-88a1200d7297> in <module>()
----> 1 fit_model(model, batches, val_batches, 2)

<ipython-input-12-0e3ad86f0929> in fit_model(model, batches, val_batches, nb_epoch)
      1 def fit_model(model, batches, val_batches, nb_epoch):
      2     model.fit_generator(batches, batches.nb_sample, nb_epoch,
----> 3                    validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

/home/nbuser/.local/lib/python2.7/site-packages/keras/models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch, **kwargs)
    933                                         nb_worker=nb_worker,
    934                                         pickle_safe=pickle_safe,
--> 935                                         initial_epoch=initial_epoch)
    936 
    937     def evaluate_generator(self, generator, val_samples,

/home/nbuser/.local/lib/python2.7/site-packages/keras/engine/training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch)
   1452 
   1453         do_validation = bool(validation_data)
-> 1454         self._make_train_function()
   1455         if do_validation:
   1456             self._make_test_function()

/home/nbuser/.local/lib/python2.7/site-packages/keras/engine/training.pyc in _make_train_function(self)
    765                                              [self.total_loss] + self.metrics_tensors,
    766                                              updates=updates,
--> 767                                              **self._function_kwargs)
    768 
    769     def _make_test_function(self):

/home/nbuser/.local/lib/python2.7/site-packages/keras/backend/theano_backend.pyc in function(inputs, outputs, updates, **kwargs)
    967                 msg = 'Invalid argument "%s" passed to K.function' % key
    968                 raise ValueError(msg)
--> 969     return Function(inputs, outputs, updates=updates, **kwargs)
    970 
    971 

/home/nbuser/.local/lib/python2.7/site-packages/keras/backend/theano_backend.pyc in __init__(self, inputs, outputs, updates, **kwargs)
    953                                         allow_input_downcast=True,
    954                                         on_unused_input='ignore',
--> 955                                         **kwargs)
    956 
    957     def __call__(self, inputs):

/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/function.pyc in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
    324                    on_unused_input=on_unused_input,
    325                    profile=profile,
--> 326                    output_keys=output_keys)
    327     # We need to add the flag check_aliased inputs if we have any mutable or
    328     # borrowed used defined inputs

/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/pfunc.pyc in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
    484                          accept_inplace=accept_inplace, name=name,
    485                          profile=profile, on_unused_input=on_unused_input,
--> 486                          output_keys=output_keys)
    487 
    488 

/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/function_module.pyc in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys)
   1793                    on_unused_input=on_unused_input,
   1794                    output_keys=output_keys).create(
-> 1795             defaults)
   1796 
   1797     t2 = time.time()

/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/function_module.pyc in create(self, input_storage, trustme, storage_map)
   1659             theano.config.traceback.limit = theano.config.traceback.compile_limit
   1660             _fn, _i, _o = self.linker.make_thunk(
-> 1661                 input_storage=input_storage_lists, storage_map=storage_map)
   1662         finally:
   1663             theano.config.traceback.limit = limit_orig

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/link.pyc in make_thunk(self, input_storage, output_storage, storage_map)
    697         return self.make_all(input_storage=input_storage,
    698                              output_storage=output_storage,
--> 699                              storage_map=storage_map)[:3]
    700 
    701     def make_all(self, input_storage, output_storage):

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/vm.pyc in make_all(self, profiler, input_storage, output_storage, storage_map)
   1045                                                  compute_map,
   1046                                                  no_recycling,
-> 1047                                                  impl=impl))
   1048                 linker_make_thunk_time[node] = time.time() - thunk_start
   1049                 if not hasattr(thunks[-1], 'lazy'):

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/op.pyc in make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
    933             try:
    934                 return self.make_c_thunk(node, storage_map, compute_map,
--> 935                                          no_recycling)
    936             except (NotImplementedError, utils.MethodNotDefined):
    937                 # We requested the c code, so don't catch the error.

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/op.pyc in make_c_thunk(self, node, storage_map, compute_map, no_recycling)
    837         _logger.debug('Trying CLinker.make_thunk')
    838         outputs = cl.make_thunk(input_storage=node_input_storage,
--> 839                                 output_storage=node_output_storage)
    840         fill_storage, node_input_filters, node_output_filters = outputs
    841 

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in make_thunk(self, input_storage, output_storage, storage_map, keep_lock)
   1188         cthunk, in_storage, out_storage, error_storage = self.__compile__(
   1189             input_storage, output_storage, storage_map,
-> 1190             keep_lock=keep_lock)
   1191 
   1192         res = _CThunk(cthunk, init_tasks, tasks, error_storage)

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in __compile__(self, input_storage, output_storage, storage_map, keep_lock)
   1129                                     output_storage,
   1130                                     storage_map,
-> 1131                                     keep_lock=keep_lock)
   1132         return (thunk,
   1133                 [link.Container(input, storage) for input, storage in

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in cthunk_factory(self, error_storage, in_storage, out_storage, storage_map, keep_lock)
   1584                 node.op.prepare_node(node, storage_map, None, 'c')
   1585             module = get_module_cache().module_from_key(
-> 1586                 key=key, lnk=self, keep_lock=keep_lock)
   1587 
   1588         vars = self.inputs + self.outputs + self.orphans

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cmodule.pyc in module_from_key(self, key, lnk, keep_lock)
   1157             try:
   1158                 location = dlimport_workdir(self.dirname)
-> 1159                 module = lnk.compile_cmodule(location)
   1160                 name = module.__file__
   1161                 assert name.startswith(location)

/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in compile_cmodule(self, location)
   1487                 lib_dirs=self.lib_dirs(),
   1488                 libs=libs,
-> 1489                 preargs=preargs)
   1490         except Exception as e:
   1491             e.args += (str(self.fgraph),)

/home/nbuser/.local/lib/python2.7/site-packages/theano/sandbox/cuda/nvcc_compiler.pyc in compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, rpaths, py_module, hide_symbols)
    403             print(cmd)
    404             raise Exception('nvcc return status', p.returncode,
--> 405                             'for cmd', ' '.join(cmd))
    406         elif config.cmodule.compilation_warning and nvcc_stdout:
    407             print(nvcc_stdout)

Exception: ('The following error happened while compiling the node', GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0}), '\n', 'nvcc return status', 2, 'for cmd', 'nvcc -shared -O3 -Xlinker -rpath,/usr/local/cuda/lib64 -arch=sm_37 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -I/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -I/usr/local/cuda/include -I/home/nbuser/.local/lib/python2.7/site-packages/theano/sandbox/cuda -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -I/home/nbuser/.local/lib/python2.7/site-packages/theano/gof -L/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -L/usr/lib -o /home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmplwhdaa/ea4e203b6529466794536f8a1bfa77ae.so mod.cu -lcudart -lcublas -lcuda_ndarray -lcudnn -lpython2.7', "[GpuDnnConv{algo='small', inplace=True}(<CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CDataType{cudnnConvolutionDescriptor_t}>, Constant{1.0}, Constant{0.0})]")

Has anyone has same error ? How to solved it?


(Anurag Goel) #127

Now that the second version of the course is online here: Unofficial release of part 1 v2, Crestle has been updated to the latest versions of Theano and cuDNN. I’d recommend upgrading your code to use Theano 1.0 which is installed on Crestle.


(Rengarajan Bashyam) #128

@anurag, even with GPU enabled, the performance is very slow… Jeremy says the model trains in a few minutes but its much slower here… even to open a notebook, it takes a long long time… it used to be fast… any reasons?


(Anurag Goel) #129

This is mostly due to increased load over the last several days, but I’ve also just tweaked some settings that should improve things.


(Rengarajan Bashyam) #130

Its just got worse… i can’t even open a new notebook… it just gives me a blank screen… and opening existing notebook is also extremely slow.


(Anurag Goel) #131

That’s because I’m running some maintenance on the cluster. Stay tuned.


(Anurag Goel) #132

Everything is back to normal now. If you run into issues DM me.


(Rengarajan Bashyam) #133

YEs. Things are super fast now!!! Thank you.


#134

I take about 15 minutes to fit the model in lesson1 v2, not under 20 seconds. Is this normal in crestle?


(Balaji Balasubramanian) #135

In my Data Storage(Disk) my usage is shown as 51.14 GB but I have deleted a lot of my data yesterday. I feel that my data usage should be less than 1 GB. Has anyone else encountered this problem ?


(Anurag Goel) #136

@balajib26 your .local directory is 51GB. If you delete it, your disk usage should be updated in a few hours.


(Balaji Balasubramanian) #137

@anurag I have deleted all my data, my disk usage still shows 51GB. My email id is balajib26@gmail.com. Please check it, I have been incurring additional cost for past 5 days.!


crestle2
crestle3


(Anurag Goel) #138

@balajib26, you have to run rm -rf .* to delete all the . directories. As I said above, there is a hidden ‘.local’ directory which you can view via ls -al in the console. Once you delete all the hidden directories you don’t need, run du -sh . to get a final total. Crestle will reflect your updated usage within 4-5 hours.


(Balaji Balasubramanian) #139

@anurag Thanks. It worked :+1:


(Balaji Balasubramanian) #140


@anurag I am not able to delete all the files. It is still consuming 125MB.


(Anurag Goel) #141

Try rm -rf instead, so it can delete the git directory as well.


(Anurag Goel) #143

Crestle has been updated to the latest versions of Tensorflow (1.8), PyTorch (0.4), CUDA 9.0 and cuDNN 7, which should make certain GPU operations much faster.


(Qin Lu) #144

Hello,

I am wondering how much should I expect to spend on GPU computing for Part 1.

Thanks,


(Anurag Goel) #145

I’d expect at least 20 hours of GPU training.