A problem about the first lesson's vgg model


#1

when I run the following code in lesson1.ipynb:

vgg = Vgg16()
# Grab a few images at a time for training and validation.
# NB: They must be in subdirectories named based on their category
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)

There is something wrong:

Downloading data from http://files.fast.ai/models/imagenet_class_index.json
16384/35363 [============>.................] - ETA: 0sFound 16 images belonging to 2 classes.
Found 8 images belonging to 2 classes.
Epoch 1/1
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-dcf03b063aee> in <module>()
      5 val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
      6 vgg.finetune(batches)
----> 7 vgg.fit(batches, val_batches, nb_epoch=1)

/home/bnrc/lee/jupyter/fast.ai/courses/deeplearning1/nbs/vgg16.pyc in fit(self, batches, val_batches, nb_epoch)
    211         """
    212         self.model.fit_generator(batches, samples_per_epoch=batches.nb_sample, nb_epoch=nb_epoch,
--> 213                 validation_data=val_batches, nb_val_samples=val_batches.nb_sample)
    214 
    215 

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, **kwargs)
    880                                         max_q_size=max_q_size,
    881                                         nb_worker=nb_worker,
--> 882                                         pickle_safe=pickle_safe)
    883 
    884     def evaluate_generator(self, generator, val_samples,

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/engine/training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe)
   1459                     outs = self.train_on_batch(x, y,
   1460                                                sample_weight=sample_weight,
-> 1461                                                class_weight=class_weight)
   1462                 except:
   1463                     _stop.set()

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/engine/training.pyc in train_on_batch(self, x, y, sample_weight, class_weight)
   1237             ins = x + y + sample_weights
   1238         self._make_train_function()
-> 1239         outputs = self.train_function(ins)
   1240         if len(outputs) == 1:
   1241             return outputs[0]

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/backend/theano_backend.pyc in __call__(self, inputs)
    790     def __call__(self, inputs):
    791         assert type(inputs) in {list, tuple}
--> 792         return self.function(*inputs)
    793 
    794 

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    915                     node=self.fn.nodes[self.fn.position_of_error],
    916                     thunk=thunk,
--> 917                     storage_map=getattr(self.fn, 'storage_map', None))
    918             else:
    919                 # old-style linkers raise their own exceptions

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/theano/gof/link.pyc in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    901         try:
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
    905         except Exception:

RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM
Apply node that caused the error: GpuDnnConv{algo='small', inplace=True, num_groups=1}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), dilation=(1, 1), conv_mode='conv', precision='float32', num_groups=1}.0, Constant{1.0}, Constant{0.0})
Toposort index: 238
Inputs types: [GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), <theano.gof.type.CDataType object at 0x7fe99837a5d0>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(16, 3, 226, 226), (64, 3, 3, 3), (16, 64, 224, 224), 'No shapes', (), ()]
Inputs strides: [(612912, 204304, 904, 4), (108, 36, 12, 4), (12845056, 200704, 896, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fe997e2f7e0>, 1.0, 0.0]
Outputs clients: [[GpuElemwise{Composite{(i0 * ((i1 + i2) + Abs((i1 + i2))))}}[(0, 1)]<gpuarray>(GpuArrayConstant{[[[[ 0.5]]]]}, GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, InplaceGpuDimShuffle{x,0,x,x}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "vgg16.py", line 127, in create
    self.ConvBlock(2, 64)
  File "vgg16.py", line 100, in ConvBlock
    model.add(Convolution2D(filters, 3, 3, activation='relu'))
  File "/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/models.py", line 312, in add
    output_tensor = layer(self.outputs[0])
  File "/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/engine/topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/engine/topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/bnrc/.conda/envs/DL-Py2/lib/python2.7/site-packages/keras/engine/topology.py", line 149, in create_node

Can someone help me?Please! Thx so much!


(Cedric Chee) #2

The problem looks like was caused by the incorrect tensor size (shape) output by Keras Convolution2D invoked by ConvBlock function in vgg16.py. I suspect the problem was due to either the input image size is incorrect or environmental issue.

Please provide us more details of your development environment. The lesson1.ipyb still working well for me as of today. Here’s my dev environment info:

  • OS: Ubuntu 16.04.4
  • CUDA 8.0.44 (and the compatible cuDNN version)
  • Anaconda 2
  • Keras 1.2.2
pip install keras==1.2.2
# Keras config in ~/.keras/keras.json

"backend": "theano",
"image_dim_ordering": "th", # SUPER IMPORTANT!!!
"epsilon": 1e-07,
"floatx": "float32"
  • Theano (pip install theano) back-end configuration:
echo "[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda" > ~/.theanorc

As a last resort, I suggest you switch Keras back-end to TensorFlow and give it a shot.