I just noticed Crestle users have created well over 10,000 notebooks as of this morning. Thank you to everyone who tried it out and provided feedback. Hopefully it will continue to be helpful!
My code was running correctly on Crestle before.
Buy recently I have the same problem, and i have changed Theano version to 0.9.0.
This error has gone. But I get another error when I fit the model.
1 #include <Python.h>
2 #include <iostream>
3 #include "theano_mod_helper.h"
4 #include "cuda_ndarray.cuh"
5 #include <math.h>
6 #include <numpy/arrayobject.h>
7 #include <numpy/arrayscalars.h>
8 #include "cudnn.h"
9 #include "cudnn_helper.h"
10 //////////////////////
11 //// Support Code
12 //////////////////////
13
14 void _capsule_destructor(PyObject *o) {
15 void *d = PyCapsule_GetContext(o);
16 void *p = PyCapsule_GetPointer(o, NULL);
17 void (*f)(void *) = (void (*)(void *))d;
18 if (f != NULL) f(p);
19 }
20
21
22 static cudnnHandle_t _handle = NULL;
23
// the code ā¦
===============================
mod.cu(77): error: identifier "cudnnSetFilterNdDescriptor_v4" is undefined
mod.cu(326): warning: conversion from a string literal to "char *" is deprecated
mod.cu(329): warning: conversion from a string literal to "char *" is deprecated
mod.cu(332): warning: conversion from a string literal to "char *" is deprecated
mod.cu(335): warning: conversion from a string literal to "char *" is deprecated
mod.cu(338): warning: conversion from a string literal to "char *" is deprecated
mod.cu(341): warning: conversion from a string literal to "char *" is deprecated
mod.cu(345): warning: conversion from a string literal to "char *" is deprecated
1 error detected in the compilation of "/tmp/tmpxft_00000113_00000000-9_mod.cpp1.ii".
['nvcc', '-shared', '-O3', '-Xlinker', '-rpath,/usr/local/cuda/lib64', '-arch=sm_37', '-m64', '-Xcompiler', '-fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-I/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-I/usr/local/cuda/include', '-I/home/nbuser/.local/lib/python2.7/site-packages/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/dist-packages/numpy/core/include', '-I/usr/include/python2.7', '-I/home/nbuser/.local/lib/python2.7/site-packages/theano/gof', '-L/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-L/usr/lib', '-o', '/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmplwhdaa/ea4e203b6529466794536f8a1bfa77ae.so', 'mod.cu', '-lcudart', '-lcublas', '-lcuda_ndarray', '-lcudnn', '-lpython2.7']
ExceptionTraceback (most recent call last)
<ipython-input-13-88a1200d7297> in <module>()
----> 1 fit_model(model, batches, val_batches, 2)
<ipython-input-12-0e3ad86f0929> in fit_model(model, batches, val_batches, nb_epoch)
1 def fit_model(model, batches, val_batches, nb_epoch):
2 model.fit_generator(batches, batches.nb_sample, nb_epoch,
----> 3 validation_data=val_batches, nb_val_samples=val_batches.nb_sample)
/home/nbuser/.local/lib/python2.7/site-packages/keras/models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch, **kwargs)
933 nb_worker=nb_worker,
934 pickle_safe=pickle_safe,
--> 935 initial_epoch=initial_epoch)
936
937 def evaluate_generator(self, generator, val_samples,
/home/nbuser/.local/lib/python2.7/site-packages/keras/engine/training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch)
1452
1453 do_validation = bool(validation_data)
-> 1454 self._make_train_function()
1455 if do_validation:
1456 self._make_test_function()
/home/nbuser/.local/lib/python2.7/site-packages/keras/engine/training.pyc in _make_train_function(self)
765 [self.total_loss] + self.metrics_tensors,
766 updates=updates,
--> 767 **self._function_kwargs)
768
769 def _make_test_function(self):
/home/nbuser/.local/lib/python2.7/site-packages/keras/backend/theano_backend.pyc in function(inputs, outputs, updates, **kwargs)
967 msg = 'Invalid argument "%s" passed to K.function' % key
968 raise ValueError(msg)
--> 969 return Function(inputs, outputs, updates=updates, **kwargs)
970
971
/home/nbuser/.local/lib/python2.7/site-packages/keras/backend/theano_backend.pyc in __init__(self, inputs, outputs, updates, **kwargs)
953 allow_input_downcast=True,
954 on_unused_input='ignore',
--> 955 **kwargs)
956
957 def __call__(self, inputs):
/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/function.pyc in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
324 on_unused_input=on_unused_input,
325 profile=profile,
--> 326 output_keys=output_keys)
327 # We need to add the flag check_aliased inputs if we have any mutable or
328 # borrowed used defined inputs
/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/pfunc.pyc in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
484 accept_inplace=accept_inplace, name=name,
485 profile=profile, on_unused_input=on_unused_input,
--> 486 output_keys=output_keys)
487
488
/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/function_module.pyc in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys)
1793 on_unused_input=on_unused_input,
1794 output_keys=output_keys).create(
-> 1795 defaults)
1796
1797 t2 = time.time()
/home/nbuser/.local/lib/python2.7/site-packages/theano/compile/function_module.pyc in create(self, input_storage, trustme, storage_map)
1659 theano.config.traceback.limit = theano.config.traceback.compile_limit
1660 _fn, _i, _o = self.linker.make_thunk(
-> 1661 input_storage=input_storage_lists, storage_map=storage_map)
1662 finally:
1663 theano.config.traceback.limit = limit_orig
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/link.pyc in make_thunk(self, input_storage, output_storage, storage_map)
697 return self.make_all(input_storage=input_storage,
698 output_storage=output_storage,
--> 699 storage_map=storage_map)[:3]
700
701 def make_all(self, input_storage, output_storage):
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/vm.pyc in make_all(self, profiler, input_storage, output_storage, storage_map)
1045 compute_map,
1046 no_recycling,
-> 1047 impl=impl))
1048 linker_make_thunk_time[node] = time.time() - thunk_start
1049 if not hasattr(thunks[-1], 'lazy'):
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/op.pyc in make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
933 try:
934 return self.make_c_thunk(node, storage_map, compute_map,
--> 935 no_recycling)
936 except (NotImplementedError, utils.MethodNotDefined):
937 # We requested the c code, so don't catch the error.
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/op.pyc in make_c_thunk(self, node, storage_map, compute_map, no_recycling)
837 _logger.debug('Trying CLinker.make_thunk')
838 outputs = cl.make_thunk(input_storage=node_input_storage,
--> 839 output_storage=node_output_storage)
840 fill_storage, node_input_filters, node_output_filters = outputs
841
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in make_thunk(self, input_storage, output_storage, storage_map, keep_lock)
1188 cthunk, in_storage, out_storage, error_storage = self.__compile__(
1189 input_storage, output_storage, storage_map,
-> 1190 keep_lock=keep_lock)
1191
1192 res = _CThunk(cthunk, init_tasks, tasks, error_storage)
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in __compile__(self, input_storage, output_storage, storage_map, keep_lock)
1129 output_storage,
1130 storage_map,
-> 1131 keep_lock=keep_lock)
1132 return (thunk,
1133 [link.Container(input, storage) for input, storage in
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in cthunk_factory(self, error_storage, in_storage, out_storage, storage_map, keep_lock)
1584 node.op.prepare_node(node, storage_map, None, 'c')
1585 module = get_module_cache().module_from_key(
-> 1586 key=key, lnk=self, keep_lock=keep_lock)
1587
1588 vars = self.inputs + self.outputs + self.orphans
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cmodule.pyc in module_from_key(self, key, lnk, keep_lock)
1157 try:
1158 location = dlimport_workdir(self.dirname)
-> 1159 module = lnk.compile_cmodule(location)
1160 name = module.__file__
1161 assert name.startswith(location)
/home/nbuser/.local/lib/python2.7/site-packages/theano/gof/cc.pyc in compile_cmodule(self, location)
1487 lib_dirs=self.lib_dirs(),
1488 libs=libs,
-> 1489 preargs=preargs)
1490 except Exception as e:
1491 e.args += (str(self.fgraph),)
/home/nbuser/.local/lib/python2.7/site-packages/theano/sandbox/cuda/nvcc_compiler.pyc in compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, rpaths, py_module, hide_symbols)
403 print(cmd)
404 raise Exception('nvcc return status', p.returncode,
--> 405 'for cmd', ' '.join(cmd))
406 elif config.cmodule.compilation_warning and nvcc_stdout:
407 print(nvcc_stdout)
Exception: ('The following error happened while compiling the node', GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0}), '\n', 'nvcc return status', 2, 'for cmd', 'nvcc -shared -O3 -Xlinker -rpath,/usr/local/cuda/lib64 -arch=sm_37 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -I/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -I/usr/local/cuda/include -I/home/nbuser/.local/lib/python2.7/site-packages/theano/sandbox/cuda -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -I/home/nbuser/.local/lib/python2.7/site-packages/theano/gof -L/home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -L/usr/lib -o /home/nbuser/.theano/compiledir_Linux-4.4-k8s-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmplwhdaa/ea4e203b6529466794536f8a1bfa77ae.so mod.cu -lcudart -lcublas -lcuda_ndarray -lcudnn -lpython2.7', "[GpuDnnConv{algo='small', inplace=True}(<CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CDataType{cudnnConvolutionDescriptor_t}>, Constant{1.0}, Constant{0.0})]")
Has anyone has same error ? How to solved it?
Now that the second version of the course is online here: Unofficial release of part 1 v2, Crestle has been updated to the latest versions of Theano and cuDNN. Iād recommend upgrading your code to use Theano 1.0 which is installed on Crestle.
@anurag, even with GPU enabled, the performance is very slowā¦ Jeremy says the model trains in a few minutes but its much slower hereā¦ even to open a notebook, it takes a long long timeā¦ it used to be fastā¦ any reasons?
This is mostly due to increased load over the last several days, but Iāve also just tweaked some settings that should improve things.
Its just got worseā¦ i canāt even open a new notebookā¦ it just gives me a blank screenā¦ and opening existing notebook is also extremely slow.
Thatās because Iām running some maintenance on the cluster. Stay tuned.
Everything is back to normal now. If you run into issues DM me.
YEs. Things are super fast now!!! Thank you.
I take about 15 minutes to fit the model in lesson1 v2, not under 20 seconds. Is this normal in crestle?
In my Data Storage(Disk) my usage is shown as 51.14 GB but I have deleted a lot of my data yesterday. I feel that my data usage should be less than 1 GB. Has anyone else encountered this problem ?
@balajib26 your .local
directory is 51GB. If you delete it, your disk usage should be updated in a few hours.
@anurag I have deleted all my data, my disk usage still shows 51GB. My email id is balajib26@gmail.com. Please check it, I have been incurring additional cost for past 5 days.!
@balajib26, you have to run rm -rf .*
to delete all the .
directories. As I said above, there is a hidden ā.localā directory which you can view via ls -al
in the console. Once you delete all the hidden directories you donāt need, run du -sh .
to get a final total. Crestle will reflect your updated usage within 4-5 hours.
Try rm -rf instead, so it can delete the git directory as well.
Crestle has been updated to the latest versions of Tensorflow (1.8), PyTorch (0.4), CUDA 9.0 and cuDNN 7, which should make certain GPU operations much faster.
Hello,
I am wondering how much should I expect to spend on GPU computing for Part 1.
Thanks,
Iād expect at least 20 hours of GPU training.