- Make sure you’re using the most recent vgg16_avg.py
- What are your versions of the following?:
- Python
- Keras
- TensorFlow
Mine are:
- Python: 3.6.0
- Keras: 1.2.2
- TensorFlow: 1.0.0
Can you paste the full error message, which includes the traceback?
Mine are:
Can you paste the full error message, which includes the traceback?
This might be a Keras 2 issue, it broke a few things.
thanks. downgrading to Keras1.2.2 fixed the issue!
Am I doing something wrong so that keras isn’t using the memory of my gpu? My iterations take about 3 minutes each. That doesn’t seem like it’s too excessive but I looked at the output of nvidia-smi and it is claiming none of the memory is in use.
This is the first time I’ve used keras with tensorflow so am wondering if I missed a setup step.
I am way late in the game but starting to play with the notebooks.
Can someone pls help me understand where this data is located for the neural-style2 notebook.
path = '/data/jhoward/imagenet/sample/'
dpath = ‘/data/jhoward/fast/imagenet/sample/’
I downloaded these
wget http://www.platform.ai/data/imagenet-sample-train.tar
wget http://www.platform.ai/data/trn_resized_288.tar
wget http://www.platform.ai/data/trn_resized_72.tar
but I cant see the pk1 files in there as needed in this:
fnames = pickle.load(open(dpath+'fnames.pkl', 'rb'))
Appreciate any help.
Is that being done in imagenet_process.ipynb
You can generate the .pkl files yourself:
fnames = list(glob.iglob(path+‘fullset/*.JPEG’))
pickle.dump(fnames, open(path+‘fnames.pkl’, ‘wb’))
Yes I got it… Thanks!
Hi, I think sklearn from where function fmin_l_bfgs_b comes from doesn’t have GPU/CUDA support…[http://scikit-learn.org/stable/faq.html#will-you-add-gpu-support]
I’ve got to the stage of creating a super-resolution network in lesson8 neural-style.ipynb. I am having a performance issue.
when I run
m_final.fit([arr_lr,arr_hr], targ, 16, 3,**params)
the first time.
I would expect this code to utilise the GPU, however on inspection the GPU has only 1MiB of memory used. The cpu ram usage is 27GB and the Epoch report suggests
[loss: 156748.141] 5% 960/19439[29.33<9:21:34, 1.82s/it]
Is this the expected behaviour I note in the video @jeremy has these values
[loss: 417266.594] 0% 4/2430 [00:05<1:04:04, 1.58s/it]
So I have too much data 19439 as opposed to 2430 The bcolz low (72) and (288) high resolution data in there entirety
OK I see @catblue88 sums this up as a scikit-learn no GPU support issue.
However we are using a VGG16 net so not sure scikit-learn even comes into reckoning.
Originally I was using Keras V2 and found I had to revert to Keras V1 V1.2.2 specifically. Now has this had an impact on my environment in that now my tensorflow is not communicating correctly even though I get the message.
Using Tensorflow backend
Any thoughts are well appreciated.
Aaah the function fmin_l_bfgs_b is scikit-learn in function solve_image and before the part super-resolution-network this is used. My problem is afterwards .
Me thinks the culprit is the loss lambda
loss = Lambda(lambda x: K.sqrt(K.mean((x[0]-x[1])**2, (1,2))))([vgg1, vgg2])
as this is a cpu calculation and not a tensorflow function call.
Two things what is
Lambda
is it Python class I can’t seem to get a google/bing fix on it.
and how can it be replaced with a tensorflow function.
I looked into the Keras documentation for Lambda and found
Wraps arbitrary expression as a Layer object
@jeremy Not sure I understand this completely would it be possible to expand a little.
My issue is that in the latter stages of the notebook i.e. super-resolution network. The fit is predicted to take > 9 hrs. So I know the the GPU is not being used but cpu ram of 27GB is can I change that so backend performs the calculation. Thanks for your time.
Very cool recent development in style transfer / image analogy:
Nice, thanks for sharing this.
Here is a more theoretical paper that interpret the idea of Gram Matrix in style transfer.
I’m not exactly sure why you’re having this problem. None of your guesses in your post look right to me - I suspect it’s something to do with your config or install.
We covered Lambda in part 1. It’s part of keras, so you can find it in the docs there.
@jeremy Thanks for the input it was my configuration/installation. My issue has been resolved. Note my iterations per second are now 35.5 instead of 1.81. And what took 9hrs per epoch are now less than 10 minutes. All be it with the m_final.fit second code cell (steps of 16 instead of 8 in previous cell) as I saved the weights and reloaded after a reconfigure. And why you may ask and the reply is SIMD options on my GPU are now active. I built tensorflow on machine so it took the option of 1080 TI GPU. Now the 3 epoch cell processing 19439 images takes 30 minutes instead of 30hrs
Has anyone else been getting the following:
InvalidArgumentError: Incompatible shapes: [64] vs. [128]
on the line:
x = solve_image(evaluator, iterations, x)
in the Recreate Style section?
I’ve had to make a few tweaks to the source code, since apparently I have Keras 2.0.3 and am doing this on Windows. It seems like the notebooks have a few things that aren’t quite right anymore as the software evolves. (the dim_ordering-> data_format in vgg16_avg.py for instance) I’m sure this will get fixed by the time the open release for part II happens.
Just mention that you should change those files’ link to the new one, since platform.ai is not available any more.
I came up with the same problem as yours, I run the code in windows, keras 2.0.3, tensorflow1.0, can anyone tell us how to fix it?
error info:
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1021 try:
-> 1022 return fn(*args)
1023 except errors.OpError as e:
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1003 feed_dict, fetch_list, target_list,
-> 1004 status, run_metadata)
1005
H:\Anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
65 try:
---> 66 next(self.gen)
67 except StopIteration:
H:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:
InvalidArgumentError: Incompatible shapes: [64] vs. [128]
[[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add, mul_1)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-26-d733ab4b440a> in <module>()
----> 1 x = solve_image(evaluator, iterations, x)
<ipython-input-10-3aac02cd4344> in solve_image(eval_obj, niter, x)
2 for i in range(niter):
3 x, min_val, info = fmin_l_bfgs_b(eval_obj.loss, x.flatten(),
----> 4 fprime=eval_obj.grads, maxfun=20)
5
6 x = np.clip(x, -127, 127)
H:\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py in fmin_l_bfgs_b(func, x0, fprime, args, approx_grad, bounds, m, factr, pgtol, epsilon, iprint, maxfun, maxiter, disp, callback, maxls)
191
192 res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
--> 193 **opts)
194 d = {'grad': res['jac'],
195 'task': res['message'],
H:\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
326 # until the completion of the current minimization iteration.
327 # Overwrite f and g:
--> 328 f, g = func_and_grad(x)
329 elif task_str.startswith(b'NEW_X'):
330 # new iteration
H:\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py in func_and_grad(x)
276 else:
277 def func_and_grad(x):
--> 278 f = fun(x, *args)
279 g = jac(x, *args)
280 return f, g
H:\Anaconda3\lib\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
290 def function_wrapper(*wrapper_args):
291 ncalls[0] += 1
--> 292 return function(*(wrapper_args + args))
293
294 return ncalls, function_wrapper
<ipython-input-9-8a811da94d93> in loss(self, x)
5
6 def loss(self, x):
----> 7 loss_, self.grad_values = self.f([x.reshape(self.shp)])
8 return loss_.astype(np.float64)
9
H:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
2101 session = get_session()
2102 updated = session.run(self.outputs + [self.updates_op],
-> 2103 feed_dict=feed_dict)
2104 return updated[:len(self.outputs)]
2105
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
765 try:
766 result = self._run(None, fetches, feed_dict, options_ptr,
--> 767 run_metadata_ptr)
768 if run_metadata:
769 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
963 if final_fetches or final_targets:
964 results = self._do_run(handle, final_targets, final_fetches,
--> 965 feed_dict_string, options, run_metadata)
966 else:
967 results = []
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1013 if handle is None:
1014 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015 target_list, options, run_metadata)
1016 else:
1017 return self._do_call(_prun_fn, self._session, handle, feed_dict,
H:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1033 except KeyError:
1034 pass
-> 1035 raise type(e)(node_def, op, message)
1036
1037 def _extend_graph(self):
InvalidArgumentError: Incompatible shapes: [64] vs. [128]
[[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add, mul_1)]]
Caused by op 'add_1', defined at:
File "H:\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "H:\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "H:\Anaconda3\lib\site-packages\ipykernel\__main__.py", line 3, in <module>
app.launch_new_instance()
File "H:\Anaconda3\lib\site-packages\traitlets\config\application.py", line 653, in launch_instance
app.start()
File "H:\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "H:\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 162, in start
super(ZMQIOLoop, self).start()
File "H:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "H:\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "H:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "H:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "H:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "H:\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "H:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "H:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "H:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "H:\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "H:\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "H:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "H:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "H:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-24-69800a5bb08b>", line 1, in <module>
loss = sum(style_loss(l1[0], l2[0])*w for l1,l2,w in zip(style_layers, style_targ, wgts))
File "H:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 794, in binary_op_wrapper
return func(x, y, name=name)
File "H:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 73, in add
result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
File "H:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
op_def=op_def)
File "H:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "H:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [64] vs. [128]
[[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add, mul_1)]]