Lesson 1 discussion

Hi Daniel,
I received a email to let me know that the customer support filled a request to the technical support for the upgrade. 2-3 hours later the limit was upgraded. You can see if the upgrade is done in the AWS dashboard EC2 -> limits.

Thanks Alexandre, I just got their reply too, and according to EC2->limits, p2.xlarge now is 1.

I’m using a p2 instance, and I also got a problem with the instance not finding the GPU drivers, but some random reboots solved both problems.

1 Like

In this lesson, batch sizes are introduced. The issue with fitting a batch into GPU memory for computation efficiency is discussed. My understanding is that SGD is done on the whole batch, so different batch sizes can change the process. Many great (and valuable) specific recommendations are given in the lectures: 3x3 kernels, 1/3 to 1/4 pseudo labeling, etc.

Are there any rules of thumb for batch_sizes? I know that they can be too small. Can they be too large?

I’m running the lesson1.ipynb, my datasets are in place, I think it is not able to instantiate vgg16 because I can’t see any models downloaded, though I manually downloaded the model and placed in sub_dir models. But still getting the same error. Please help.

vgg = Vgg16()

Grab a few images at a time for training and validation.

NB: They must be in subdirectories named based on their category

batches = vgg.get_batches(path+‘train’, batch_size=batch_size)

val_batches = vgg.get_batches(path+‘valid’, batch_size=batch_size*2)

vgg.finetune(batches)

vgg.fit(batches, val_batches, nb_epoch=1)

/home/prany/anaconda3/lib/python3.5/site-packages/keras/layers/core.py:577: UserWarning: output_shape argument not specified for layer lambda_2 and cannot be automatically inferred with the Theano backend. Defaulting to output shape (None, 3, 224, 224) (same as input shape). If the expected output shape is different, specify it via the output_shape argument.
.format(self.name, input_shape))


OSError Traceback (most recent call last)
in ()
----> 1 vgg = Vgg16()
2 # Grab a few images at a time for training and validation.
3 # NB: They must be in subdirectories named based on their category
4 batches = vgg.get_batches(path+‘train’, batch_size=batch_size)
5 val_batches = vgg.get_batches(path+‘valid’, batch_size=batch_size*2)

/home/prany/courses-master/PranY/nbs/vgg16.py in init(self)
31 def init(self):
32 self.FILE_PATH = ‘http://www.platform.ai/models/’
—> 33 self.create()
34 self.get_classes()
35

/home/prany/courses-master/PranY/nbs/vgg16.py in create(self)
80
81 fname = ‘vgg16.h5’
—> 82 model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir=‘models’))
83
84

/home/prany/anaconda3/lib/python3.5/site-packages/keras/engine/topology.py in load_weights(self, filepath, by_name)
2688 ‘’'
2689 import h5py
-> 2690 f = h5py.File(filepath, mode=‘r’)
2691 if ‘layer_names’ not in f.attrs and ‘model_weights’ in f:
2692 f = f[‘model_weights’]

/home/prany/anaconda3/lib/python3.5/site-packages/h5py/_hl/files.py in init(self, name, mode, driver, libver, userblock_size, swmr, **kwds)
270
271 fapl = make_fapl(driver, libver, **kwds)
–> 272 fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
273
274 if swmr_support:

/home/prany/anaconda3/lib/python3.5/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
90 if swmr and swmr_support:
91 flags |= h5f.ACC_SWMR_READ
—> 92 fid = h5f.open(name, flags, fapl=fapl)
93 elif mode == ‘r+’:
94 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654)()

h5py/h5f.pyx in h5py.h5f.open (/home/ilan/minonda/conda-bld/work/h5py/h5f.c:1942)()

OSError: Unable to open file (Truncated file: eof = 511934464, sblock->base_addr = 0, stored_eoa = 553482496)

1 Like

You probably wouldn’t want to go higher than 128, since it’ll take more time and have little if any upside

2 Likes

It looks like the file didn’t download properly. It should be downloaded automatically. Try deleting the copy in ~/.keras/models and run it again.

4 Likes

Did you restart your notebook after this?

I think this fits best in here although its part of the lesson 2 video:

@jeremy what do you use to collapse markdown headings in jupyter as shown with your solution of the lesson 1 homework?

I googled a bit but did not find anything that looks right

Thank you for this amazing course

I can run Lesson 1 on the AWS without a problem. Now I want to test it on my own computer locally. The first few lines of code is fine, until here:

Import our class, and instantiate

from vgg16 import Vgg16
vgg = Vgg16()

The error message comes from the last line. Any idea of what’s going on? I am using theano and python 2.7.

Using Theano backend.
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
(Subtensor{int64}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Subtensor{int64}.0)
Traceback (most recent call last):
File “/home/abigail/workspace/fast-ai/my_lesson_1.py”, line 21, in
vgg = Vgg16()
File “/home/abigail/workspace/fast-ai/vgg16.py”, line 33, in init
self.create(size, include_top)
File “/home/abigail/workspace/fast-ai/vgg16.py”, line 85, in create
model.add(Flatten())
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/models.py”, line 312, in add
output_tensor = layer(self.outputs[0])
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/topology.py”, line 514, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/topology.py”, line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/topology.py”, line 152, in create_node
output_shapes = to_list(outbound_layer.get_output_shape_for(input_shapes[0]))
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/layers/core.py”, line 402, in get_output_shape_for
’(got ’ + str(input_shape[1:]) + '. '
Exception: The shape of the input to “Flatten” is not fully defined (got (0, 7, 512). Make sure to pass a complete “input_shape” or “batch_input_shape” argument to the first layer in your model.

The though my computer may be a bit slower to process the data, but don’t know why it generated such an error. Thank you.

@martin

Not a direct answer to your question. Might be totally off so apology in advance.

Did you change your local keras to use theno ?

You should check the ~/.keras/keras.json file , check image_dim_ordering value.

It should look like below

{
“image_dim_ordering”: “th”,
“epsilon”: 1e-07,
“floatx”: “float32”,
“backend”: “theano”
}

I am using my local Mac to run everything end to end on sample data. It worked fine.

@jagatsingh, as you can see from the printed message: “Using Theano backend.”.

It turned out that I need to change two settings here:

{
“image_dim_ordering”: “th”,
“epsilon”: 1e-07,
“floatx”: “float32”,
“backend”: “theano”
}

  1. backend: from tensorflow to theano
  2. image_dim_ordering: from tf to th

This is something new I learned. I need to change both in order to adjust the backend from ‘theano’ to tensorflow.

Hi everyone,

ctrl + b for tmux does not work. Only ctrl + d help me exit tmux.

I think tmux is useful, however, I can’t find solution online, could anyone give me some idea how to solve this problem?

I attempted to use tmux with mac OS X Sierra version 10.12.1, under ubuntu@ip-10-0-0-7:~/nbs$

Thanks

Daniel

ctrl b does not give you any feedback, ctrl b + press d afterwards to exit

do not use ctrl d itself

1 Like

Hi, @jagatsingh,

What’s your memory size of your laptop? I received this error message, and I am afraid it’s due to memory limit:

22976/23000 [============================>.] - ETA: 8s - loss: 0.2168 - acc: 0.9570 Traceback (most recent call last):
File “/home/abigail/workspace/fast-ai/my_lesson_1.py”, line 27, in
vgg.fit(batches, val_batches, nb_epoch=1)
File “/home/abigail/workspace/fast-ai/vgg16.py”, line 126, in fit
validation_data=val_batches, nb_val_samples=val_batches.nb_sample)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/models.py”, line 882, in fit_generator
pickle_safe=pickle_safe)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/training.py”, line 1491, in fit_generator
pickle_safe=pickle_safe)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/training.py”, line 1578, in evaluate_generator
outs = self.test_on_batch(x, y, sample_weight=sample_weight)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/training.py”, line 1277, in test_on_batch
outputs = self.test_function(ins)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/keras/backend/theano_backend.py”, line 792, in call
return self.function(*inputs)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.py”, line 871, in call
storage_map=getattr(self.fn, ‘storage_map’, None))
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/theano/gof/link.py”, line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File “/home/abigail/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.py”, line 859, in call
outputs = self.fn()
RuntimeError: BaseCorrMM: Failed to allocate output of 128 x 64 x 224 x 224
Apply node that caused the error: CorrMM{valid, (1, 1)}(IncSubtensor{InplaceSet;::, ::, int64:int64:, int64:int64:}.0, Subtensor{::, ::, ::int64, ::int64}.0)
Toposort index: 86
Inputs types: [TensorType(float32, 4D), TensorType(float32, 4D)]
Inputs shapes: [(128, 3, 226, 226), (64, 3, 3, 3)]
Inputs strides: [(612912, 204304, 904, 4), (108, 36, -12, -4)]
Inputs values: [‘not shown’, ‘not shown’]
Outputs clients: [[Elemwise{Composite{(i0 * (Abs((i1 + i2)) + i1 + i2))}}[(0, 1)](TensorConstant{(1, 1, 1, 1) of 0.5}, CorrMM{valid, (1, 1)}.0, Reshape{4}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):

Why is batch_size doubled for validation batches in the 7 line example?

batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
2 Likes

Thanks a lot!, I used my download manager to download and copy the files manually. It worked, But I saw a directory named “models” as a sub-dir in the “data” folder in your video. After reading the Vgg16.py code, I though thats where I need to put the models, however, as you mentioned it should be in ~/.keras/models. Why is that?

@martin I too received a similar memory error but I could bet that my GTX 980m with 4 gigs is sufficient. I dived deeper and found that after restarting the kernel, the system frees a lot of GPU memory which somehow gets pilled up with every run. Don’t know why get.collect() doesn’t work to empty the GPU cache. So just exit and do a clean run.

Also, I read about gcc compiler doing some memory leaks for version 4.8 and below. If at any point your theano fails to recognize your GPU, simply update your gcc compiler and link the default ones to the newer alternatives. Hope this helps!

Hi,

My machine is 16 GB , and I was not running on full data. I just reduced it to learn on local machine.

Hi, Prany:

I restarted my computer and it ran fine now. Somehow my Linux used a lot of swap memory and restarted the machine solved the problem.