Lesson 8 Discussion

Matthew · March 22, 2017, 5:35am

Make sure you’re using the most recent vgg16_avg.py
What are your versions of the following?:
Python
Keras
TensorFlow

Mine are:

Python: 3.6.0
Keras: 1.2.2
TensorFlow: 1.0.0

Can you paste the full error message, which includes the traceback?

davecg · March 22, 2017, 12:27pm

This might be a Keras 2 issue, it broke a few things.

abhishek21 · March 23, 2017, 2:33am

thanks. downgrading to Keras1.2.2 fixed the issue!

brendan · March 23, 2017, 10:33pm

New paper on photorealistic style transfer generates crazy cool photos.

brianorwhatever · March 27, 2017, 11:07pm

Am I doing something wrong so that keras isn’t using the memory of my gpu? My iterations take about 3 minutes each. That doesn’t seem like it’s too excessive but I looked at the output of nvidia-smi and it is claiming none of the memory is in use.

This is the first time I’ve used keras with tensorflow so am wondering if I missed a setup step.

garima.agarwal · April 1, 2017, 10:56pm

I am way late in the game but starting to play with the notebooks.

Can someone pls help me understand where this data is located for the neural-style2 notebook.

path = '/data/jhoward/imagenet/sample/'
dpath = ‘/data/jhoward/fast/imagenet/sample/’

I downloaded these

wget http://www.platform.ai/data/imagenet-sample-train.tar
wget http://www.platform.ai/data/trn_resized_288.tar
wget http://www.platform.ai/data/trn_resized_72.tar

but I cant see the pk1 files in there as needed in this:
fnames = pickle.load(open(dpath+'fnames.pkl', 'rb'))

Appreciate any help.

garima.agarwal · April 1, 2017, 11:13pm

Is that being done in imagenet_process.ipynb

aifish · April 2, 2017, 5:24am

You can generate the .pkl files yourself:

fnames = list(glob.iglob(path+‘fullset/*.JPEG’))
pickle.dump(fnames, open(path+‘fnames.pkl’, ‘wb’))

garima.agarwal · April 2, 2017, 8:29pm

Yes I got it… Thanks!

catblue88 · May 2, 2017, 7:53pm

Hi, I think sklearn from where function fmin_l_bfgs_b comes from doesn’t have GPU/CUDA support…[http://scikit-learn.org/stable/faq.html#will-you-add-gpu-support]

RogerS49 · May 4, 2017, 10:12am

I’ve got to the stage of creating a super-resolution network in lesson8 neural-style.ipynb. I am having a performance issue.

when I run
m_final.fit([arr_lr,arr_hr], targ, 16, 3,**params)
the first time.

I would expect this code to utilise the GPU, however on inspection the GPU has only 1MiB of memory used. The cpu ram usage is 27GB and the Epoch report suggests

[loss: 156748.141] 5% 960/19439[29.33<9:21:34, 1.82s/it]

Is this the expected behaviour I note in the video @jeremy has these values

[loss: 417266.594] 0% 4/2430 [00:05<1:04:04, 1.58s/it]

So I have too much data 19439 as opposed to 2430 The bcolz low (72) and (288) high resolution data in there entirety

OK I see @catblue88 sums this up as a scikit-learn no GPU support issue.

However we are using a VGG16 net so not sure scikit-learn even comes into reckoning.

Originally I was using Keras V2 and found I had to revert to Keras V1 V1.2.2 specifically. Now has this had an impact on my environment in that now my tensorflow is not communicating correctly even though I get the message.

Using Tensorflow backend

Any thoughts are well appreciated.

Aaah the function fmin_l_bfgs_b is scikit-learn in function solve_image and before the part super-resolution-network this is used. My problem is afterwards .

Me thinks the culprit is the loss lambda

loss = Lambda(lambda x: K.sqrt(K.mean((x[0]-x[1])**2, (1,2))))([vgg1, vgg2])

as this is a cpu calculation and not a tensorflow function call.

Two things what is
Lambda
is it Python class I can’t seem to get a google/bing fix on it.
and how can it be replaced with a tensorflow function.

I looked into the Keras documentation for Lambda and found
Wraps arbitrary expression as a Layer object

RogerS49 · May 4, 2017, 12:18pm

@jeremy Not sure I understand this completely would it be possible to expand a little.
My issue is that in the latter stages of the notebook i.e. super-resolution network. The fit is predicted to take > 9 hrs. So I know the the GPU is not being used but cpu ram of 27GB is can I change that so backend performs the calculation. Thanks for your time.

brendan · May 4, 2017, 3:22pm

Very cool recent development in style transfer / image analogy:

xinxin.li.seattle · May 4, 2017, 4:25pm

Nice, thanks for sharing this.

Here is a more theoretical paper that interpret the idea of Gram Matrix in style transfer.

jeremy · May 4, 2017, 9:34pm

I’m not exactly sure why you’re having this problem. None of your guesses in your post look right to me - I suspect it’s something to do with your config or install.

We covered Lambda in part 1. It’s part of keras, so you can find it in the docs there.

RogerS49 · May 5, 2017, 8:33am

@jeremy Thanks for the input it was my configuration/installation. My issue has been resolved. Note my iterations per second are now 35.5 instead of 1.81. And what took 9hrs per epoch are now less than 10 minutes. All be it with the m_final.fit second code cell (steps of 16 instead of 8 in previous cell) as I saved the weights and reloaded after a reconfigure. And why you may ask and the reply is SIMD options on my GPU are now active. I built tensorflow on machine so it took the option of 1080 TI GPU. Now the 3 epoch cell processing 19439 images takes 30 minutes instead of 30hrs

ericm · May 12, 2017, 12:06pm

Has anyone else been getting the following:
InvalidArgumentError: Incompatible shapes: [64] vs. [128]

on the line:
x = solve_image(evaluator, iterations, x)
in the Recreate Style section?

I’ve had to make a few tweaks to the source code, since apparently I have Keras 2.0.3 and am doing this on Windows. It seems like the notebooks have a few things that aren’t quite right anymore as the software evolves. (the dim_ordering-> data_format in vgg16_avg.py for instance) I’m sure this will get fixed by the time the open release for part II happens.

justinho · May 15, 2017, 7:44am

Just mention that you should change those files’ link to the new one, since platform.ai is not available any more.

justinho · May 18, 2017, 8:14am

I came up with the same problem as yours, I run the code in windows, keras 2.0.3, tensorflow1.0, can anyone tell us how to fix it?

error info:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1021     try:
-> 1022       return fn(*args)
   1023     except errors.OpError as e:

H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1003                                  feed_dict, fetch_list, target_list,
-> 1004                                  status, run_metadata)
   1005 

H:\Anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
     65             try:
---> 66                 next(self.gen)
     67             except StopIteration:

H:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

InvalidArgumentError: Incompatible shapes: [64] vs. [128]
	 [[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add, mul_1)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-26-d733ab4b440a> in <module>()
----> 1 x = solve_image(evaluator, iterations, x)

<ipython-input-10-3aac02cd4344> in solve_image(eval_obj, niter, x)
      2     for i in range(niter):
      3         x, min_val, info = fmin_l_bfgs_b(eval_obj.loss, x.flatten(),
----> 4                                         fprime=eval_obj.grads, maxfun=20)
      5 
      6         x = np.clip(x, -127, 127)

H:\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py in fmin_l_bfgs_b(func, x0, fprime, args, approx_grad, bounds, m, factr, pgtol, epsilon, iprint, maxfun, maxiter, disp, callback, maxls)
    191 
    192     res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
--> 193                            **opts)
    194     d = {'grad': res['jac'],
    195          'task': res['message'],

H:\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    326             # until the completion of the current minimization iteration.
    327             # Overwrite f and g:
--> 328             f, g = func_and_grad(x)
    329         elif task_str.startswith(b'NEW_X'):
    330             # new iteration

H:\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py in func_and_grad(x)
    276     else:
    277         def func_and_grad(x):
--> 278             f = fun(x, *args)
    279             g = jac(x, *args)
    280             return f, g

H:\Anaconda3\lib\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
    290     def function_wrapper(*wrapper_args):
    291         ncalls[0] += 1
--> 292         return function(*(wrapper_args + args))
    293 
    294     return ncalls, function_wrapper

<ipython-input-9-8a811da94d93> in loss(self, x)
      5 
      6     def loss(self, x):
----> 7         loss_, self.grad_values = self.f([x.reshape(self.shp)])
      8         return loss_.astype(np.float64)
      9 

H:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
   2101         session = get_session()
   2102         updated = session.run(self.outputs + [self.updates_op],
-> 2103                               feed_dict=feed_dict)
   2104         return updated[:len(self.outputs)]
   2105 

H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    765     try:
    766       result = self._run(None, fetches, feed_dict, options_ptr,
--> 767                          run_metadata_ptr)
    768       if run_metadata:
    769         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    963     if final_fetches or final_targets:
    964       results = self._do_run(handle, final_targets, final_fetches,
--> 965                              feed_dict_string, options, run_metadata)
    966     else:
    967       results = []

H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1013     if handle is None:
   1014       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015                            target_list, options, run_metadata)
   1016     else:
   1017       return self._do_call(_prun_fn, self._session, handle, feed_dict,

H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1033         except KeyError:
   1034           pass
-> 1035       raise type(e)(node_def, op, message)
   1036 
   1037   def _extend_graph(self):

InvalidArgumentError: Incompatible shapes: [64] vs. [128]
	 [[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add, mul_1)]]

Caused by op 'add_1', defined at:
  File "H:\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "H:\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "H:\Anaconda3\lib\site-packages\ipykernel\__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "H:\Anaconda3\lib\site-packages\traitlets\config\application.py", line 653, in launch_instance
    app.start()
  File "H:\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "H:\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 162, in start
    super(ZMQIOLoop, self).start()
  File "H:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 887, in start
    handler_func(fd_obj, events)
  File "H:\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "H:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "H:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "H:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "H:\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "H:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "H:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "H:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "H:\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "H:\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "H:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "H:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes
    if self.run_code(code, result):
  File "H:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-24-69800a5bb08b>", line 1, in <module>
    loss = sum(style_loss(l1[0], l2[0])*w for l1,l2,w in zip(style_layers, style_targ, wgts))
  File "H:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 794, in binary_op_wrapper
    return func(x, y, name=name)
  File "H:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 73, in add
    result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "H:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "H:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "H:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [64] vs. [128]
	 [[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add, mul_1)]]

justinho · May 19, 2017, 7:27am

Oh, the link should be file.fast.ai right? Jeremy you typed it as file.fastai.