Lesson 2 discussion

Matthew · January 10, 2017, 4:25pm

@cmeff1
I’ve had different memory problems than yours, but what solves them for me is shutting down any other GPU-related notebooks, restarting my current one, and sometimes even restarting my computer. I use an NVIDIA GTX 1070.
I hope this “restart everything” advice helps. Might be worth a shot while waiting for real advice.

cmeff1 · January 10, 2017, 11:14pm

@Matthew Yea i tried the restart. How much memory is your machine running and how much memory does your graphics card have? This is the error i’m getting in regards to memory still.

trn_data_a = get_batches(path+‘train’, shuffle=False, batch_size=1, class_mode=None, target_size=(224,224))

Found 23000 images belonging to 2 classes.
In [20]:

trn_data = np.concatenate([trn_data_a.next() for i in range(trn_data_a.nb_sample)])

MemoryError Traceback (most recent call last)
in ()
----> 1 trn_data = np.concatenate([trn_data_a.next() for i in range(trn_data_a.nb_sample)])

MemoryError:

Gelu74 · January 11, 2017, 2:32pm

At the end of lesson 2 notebook we are shown how to fine tune more layers in keras. I’ve followed that approach and I don’t seem to get any improvement in the accuracy, in fact my submission to kaggle with more layers trained is worse than by just finetuning the last layer.
Has someone encountered a similar result?
In addition, i’ve tried to train the whole model, setting all layer to trainable=True, and the accuracy of the model gets worse with each epoch. Are these issues covered later in the course?

cmeff1 · January 12, 2017, 1:35am

So…I’m still working out how to handle this memory issue. It is due to the concatenation of the arrays in the training data. I have stood up my AWS instance and pushed through on there. 55 Gigs is what htop is reporting for consumption for the get_data function. So I will continue to try to figure out how to manage this on my personal machine, but will also push forward on the AWS machine so I can focus on the ML and not getting to hung up on how to manage the memory.

My new question is, is it normal to need so much memory? Are there better ways to manage this?
So that is my update. I’m still open to ideas on how to deal with such a large array on my local machine. Per some suggestions it was to save each individual array and then do the concatenation on the disk. Or my thought was just to increase my swap space on my box. Any further suggestions I would greatly appreciate. Thanks for your help!

PS this is really cool stuff!

kelin-christi · January 12, 2017, 9:12pm

As it turns out, it was just a glitch. I tried playing with every setting I could, but to no avail! If anybody experiences what I just experienced, I’m afraid you will have to start a new p2 instance.

Best!

Steve · January 14, 2017, 4:16pm

FWIW this worked for me as well – thx @jbrown81

wgpubs · January 14, 2017, 9:16pm

How did the authors of VGG build, for example, layer 1 of their model so as to be able to identify edges?

Did they essentially build a NN of images that just has a bunch of different 7x7 pixel images of edges?

Also, when they build the subsequent layers of their model, did they do so on top of existing layers or were they trained independently?

Thanks

Gelu74 · January 14, 2017, 11:55pm

@wgpubs lesson 3 will probably help you understand this better. Basically the beauty of deep learning is that the network learns which features to use, there is no feature engineering, the only input to the model are the imagenet training images (and obviously the design of the network architecture and hyperparameters to tune)

jeremy · January 16, 2017, 12:42am

When you switch from using get_data() to get_batches() you’ll have to change the rest of the code to use batches instead of a numpy array. So, as you found, you can’t use .shape. But you should be able to avoid .shape, since with batches you can simply call fit_generator. See https://github.com/fchollet/keras/issues/1627 for a discussion of this method, if you’re having trouble with it, or search our github repo for ‘fit_generator’ to see lots of examples of its use in our notebooks.

cmeff1 · January 17, 2017, 3:54am

Jeremy,

Looking through the samples as well as reading the link you sent, I have more questions Initially i got the error that it was necessary for “model” to be compiled first. So reviewing the samples in the github I see there is a compile line. In lesson2, model=vgg.model. When looking through most of the samples in the github I notice that model is defined, compiled and then fit_generator is run. In this case is it necessary to still define model? as it seems its is allready defined via model = vgg.model. Also in the compile statements I notice Adam() as the first parm. In the link you sent it seems that the first parm is a generator, what then is Adam()? Thanks for all your help.

-Chris

wgpubs · January 17, 2017, 5:11am

In the Lesson 2 notebook we have this code:

x = random((30,2))
y = np.dot(x, [2., 3.]) + 1.

lm = Sequential([ Dense(1, input_shape=(2,)) ])
lm.compile(optimizer=SGD(lr=0.1), loss='mse')

lm.fit(x, y, nb_epoch=5, batch_size=1)

Am I understanding how SGD is working here if I say that the code above iterates through the 30 training examples 5 times, using only a single example per iteration to evaluate the loss function and update the weights (parameters)?

… OR …

Is it right to say that the code above iterates through the 30 training examples 5 times, going through one example after another during each iteration, evaluating the loss and updating the weights?

Thanks

Matthew · January 18, 2017, 4:22pm

My claim:

This fit function call will update the weights 150 times.

(30 / 1) * 5

(len(x) / batch_size) * nb_epoch

Why I believe this:

I looked up the documentation for the fit function and saw that the description for the batch_size parameter said:

“Number of samples per gradient update.”

I assumed that “gradient update” meant not only updating the gradients but also using the updated gradients to update the weights.

Questions for others:

Am I correct?
Would setting batch_size equal to the size of the training set be equivalent to gradient descent (i.e. not stochastic gradient descent)?
How can I verify / falsify my claim experimentally?

wgpubs · January 18, 2017, 8:26pm

Did some investigation and resolved the definitions as follows:

“epoch” = A run through all the training examples

“batch size” = The number of training examples to use at a time to evaluate the loss function and update the weights.

Example: You have 1000 training examples, 5 epochs, and a batch size of 25.

You system will evaluate the loss function and update weights 40 times, using 25 training examples at a time (e.g., 1000/25 = 40) for each epoch. This means that your “fit” function will take the first 25 examples and do forward/back propagation, then it will take the next 25 and do the same, and so on until it has seen every training example. Once this has be done 40 times an epoch will have been completed.

Helpful links:

Matthew · January 19, 2017, 6:44pm

I think the system would evaluate the loss function and update the weights using 25 training examples at a time until all the training examples have been seen. This would mean that there would be 40 updates per epoch, since there are 1000 training examples per epoch and 25 training examples per update and 1000/25 is 40.

wgpubs · January 21, 2017, 3:08am

Ha … you’re right. I meant what you are saying even though I put the wrong number in there. Updating now and thanks for the catch!

wgpubs · January 21, 2017, 6:12am

I have two questions regarding the relationship between learning rate and the number of layers being trained, specifically concerning this comment from the last section in the lesson 2 notebook:

“Since we’re training more layers, and since we’ve already optimized the last layer, we should use a lower learning rate than previously”

Why should we use a lower learning rate if training more layers?
Should we use a lower and lower learning rate for each additional layer we train? So if I train 2 layers than maybe I set it to 0.1, and if 3 layers then set it to 0.01, and if 4 layers then 0.03, and so forth. Or, is the advice to lower the learning rate to a fixed value if you train more than one layer?

wgpubs · January 22, 2017, 12:49am

Question re: using bcolz to save processed arrays so we don’t have to load and resize images every time we want to use them.

Is it also standard or recommended practice to save the training/validation classes array and one hot encoded labels as well since without this data, the images array would be useless?

wgpubs · January 23, 2017, 3:17am

How come we have to re-compile the model after setting the trainable property of layer(s) = False, but not when setting them to True?

Under the “Training Multiple Layers in Keras” section In the lesson 2 notebook there is this code …

layers = model.layers
# Get the index of the first dense layer...
first_dense_idx = [index for index,layer in enumerate(layers) if type(layer) is Dense][0]
# ...and set this and all subsequent layers to trainable
for layer in layers[first_dense_idx:]: layer.trainable=True

… and then the comment …

Since we haven’t changed our architecture, there’s no need to re-compile the model - instead, we just set the learning rate. Since we’re training more layers, and since we’ve already optimized the last layer, we should use a lower learning rate than previously.

But the Keras documentation says:

Additionally, you can set the trainable property of a layer to True or False after instantiation. For this to take effect, you will need to call compile() on your model after modifying the trainable property.

So I’m confused. The Keras docs seem to indicate that the model needs to be compiled anytime the trainable property of it’s layer(s) are changed … whereas the comment in the notebook seems to indicate this isn’t the case if you are setting the property to True.

Thanks

stella · January 26, 2017, 4:07pm

I run into the same Error. After checking the function get_data in utils, it seems like get_data is expecting an path instead of a batch (a DirectoryIterator). It worked after changed the code to
val_data = get_data(path+‘valid’,)
trn_data = get_data(path+‘train’)

anamariapopescug · January 26, 2017, 5:00pm

yep, that’d been the issue, i’d fixed it :). this is an old question (from November) - but i’d put a fix somewhere on the wiki, it’s nice someone listed one here where the question was - thanks!