Lesson 3 discussion


(Angel) #81

@xinxin.li.seattle In the random initialization


(Xinxin) #82

that’s exactly where i am lost!!!
i couldn’t find the random initialization in the code (tried search “rand”“random”).

Would you be so kind to point me to that line of code?
[https://github.com/fastai/courses/blob/master/deeplearning1/nbs/mnist.ipynb]


(Angel) #83

Whenever you define a layer of the model it is initialized with weights, each type of layer has a default way to initialize its weights, you can change it by:
https://keras.io/initializations/
it is in those initializations where the randomness is introduced


(Xinxin) #84

Aha!! I looked at Jeremy’s code, and I couldn’t find the explicit initialization as in ‘VGG-style’ CNN (attached below). Does that mean it’s initialized with a default distribution? If so, what is the default? uniform? normal?

def get_model():
model = Sequential([
Lambda(norm_input, input_shape=(1,28,28)),
Convolution2D(32,3,3, activation=‘relu’),
Convolution2D(32,3,3, activation=‘relu’),
MaxPooling2D(),
Convolution2D(64,3,3, activation=‘relu’),
Convolution2D(64,3,3, activation=‘relu’),
MaxPooling2D(),
Flatten(),
Dense(512, activation=‘relu’),
Dense(10, activation=‘softmax’)
])
model.compile(Adam(), loss=‘categorical_crossentropy’, metrics=[‘accuracy’])
return model


(Xinxin) #85

Never mind, found my own answer.

“?? Convolution2D” gives me the following:

Init signature: Convolution2D(self, nb_filter, nb_row, nb_col, init=‘glorot_uniform’, activation=‘linear’, weights=None, border_mode=‘valid’, subsample=(1, 1), dim_ordering=‘default’, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, **kwargs)


(Sean) #86

I am stuck at the same point. But then get stuck with a problem with vgg16bn

The network we are building is just the top of vgg16bn anyway so it seems a logical workaround just to jump to using vggbn.

However if I run:

from vgg16bn import *
vgg = Vgg16BN()

I get the follow error.


ValueError Traceback (most recent call last)
in ()
1 from vgg16bn import *
----> 2 vgg = Vgg16BN()
3 #model = vgg.model
4 #from vgg16 import *
5 #vgg = Vgg16()

/home/anaconda3/envs/python2/nbs/lesson2/vgg16bn.pyc in init(self, size, include_top)
31 def init(self, size=(224,224), include_top=True):
32 self.FILE_PATH = “/home/anaconda3/envs/python2/nbs/lesson1/data/kaggle/”
—> 33 self.create(size, include_top)
34 self.get_classes()
35

/home/anaconda3/envs/python2/nbs/lesson2/vgg16bn.pyc in create(self, size, include_top)
89
90 fname = ‘vgg16_bn.h5’
—> 91 model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir=‘models’))
92
93

/home/anaconda3/envs/python2/lib/python2.7/site-packages/keras/utils/data_utils.pyc in get_file(fname, origin, untar, md5_hash, cache_subdir)
111 try:
112 urlretrieve(origin, fpath,
–> 113 functools.partial(dl_progress, progbar=progbar))
114 except URLError as e:
115 raise Exception(error_msg.format(origin, e.errno, e.reason))

/home/anaconda3/envs/python2/lib/python2.7/site-packages/keras/utils/data_utils.pyc in urlretrieve(url, filename, reporthook, data)
48 yield chunk
49
—> 50 response = urlopen(url, data)
51 with open(filename, ‘wb’) as fd:
52 for chunk in chunk_read(response, reporthook=reporthook):

/home/anaconda3/envs/python2/lib/python2.7/urllib2.pyc in urlopen(url, data, timeout, cafile, capath, cadefault, context)
152 else:
153 opener = _opener
–> 154 return opener.open(url, data, timeout)
155
156 def install_opener(opener):

/home/anaconda3/envs/python2/lib/python2.7/urllib2.pyc in open(self, fullurl, data, timeout)
419
420 req.timeout = timeout
–> 421 protocol = req.get_type()
422
423 # pre-process request

/home/anaconda3/envs/python2/lib/python2.7/urllib2.pyc in get_type(self)
281 self.type, self.__r_type = splittype(self.__original)
282 if self.type is None:
–> 283 raise ValueError, “unknown url type: %s” % self.__original
284 return self.type
285

ValueError: unknown url type: /home/anaconda3/envs/python2/nbs/lesson1/data/kaggle/vgg16_bn.h5

Things I have checked:
-So I have checked the path and the vgg16_bn.h5 file is present at the required location (and in a few other places as well for redundancy).

-The path itself is valid.

-Replacing vgg16BN with vgg runs normally with the same path.

-Changing the path in which I launch the notebook does not change this.

Has anyone else had this problem, and if so how did you solve it?


(Teemu Kurppa) #87

Did you modify FILE_PATH yourself? If so, try file:// in the beginning of the path


#88

There are five steps to take once you are overfitting. But for two of them it’s not really clear to me how they translate to what Jeremy has been teaching us in the videos.

  • Use architectures that generalize well
    What kind of architectures generalize well?

  • Reduce architecture complexity
    Given VGG, how could we reduce its complexity to make it less prone to overfitting?

I don’t really know how to answer these two questions.


(Vishnu Subramanian) #89

For your 1st question on architecture : Start with a very simple linear model , then for image recognition tasks include convolution layers, max pooling , drop out . Basically get inspired by looking at how other architectures are designed. If the problem is similar to Imagenet , then you can use transfer learning . By fine tuning an existing model. You may decide to choose to train how many layers to train. In practice , training the dense layers works well.

For 2nd question on complexity : Model is less complex when you have fewer nodes/layers , the number of parameters to learn is less. The more number of layers and nodes you add , the model has chance to learn or memorize the entire input data , where it does not generalise for unseen data. So reducing the layers or nodes in each layers forces the network not to memorize.

Hope it helps.:slight_smile:


#90

Thanks Vishnu, that helps!


(Elizabeth) #91

Hey all,

Really struggling with some parts of lesson 3.

First, in order to load weights from the fine tuned model in lesson 2, I had to pop a layer before adding the Dense(2, activation = ‘softmax’) layer. Does this make sense? otherwise I end up with 17 layers instead of 16.

new_model = vgg_ft(2) #create a model with a binary classifier
new_model.pop()
new_model.add(Dense(2,activation = 'softmax')) 

Later, when trying to load in weights, I get this error

---------------------------------------
TypeErrorTraceback (most recent call last)
<ipython-input-244-e9cf85a7e40b> in <module>()
----> 1 fc_model = get_fc_model()

<ipython-input-243-24066d99319f> in get_fc_model()
     10         ])
     11 
---> 12     for l2,l3 in zip(nmodel.layers, fc_layers): l2.set_weights(wghts(l3))
     13 
     14     nmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in set_weights(self, weights)
    873         '''
    874         params = self.weights
--> 875         if len(params) != len(weights):
    876             raise Exception('You called `set_weights(weights)` on layer "' + self.name +
    877                             '" with a  weight list of length ' + str(len(weights)) +

TypeError: object of type 'generator' has no len()

But when I call type(wghts) it says it’s a function (as it should be), so I can’t figure out what it means by generator…

Here’s the rest of my (Jeremy’s) code:

def wghts(layer): return (w/2 for w in layer.get_weights())

def get_fc_model():
    nmodel = Sequential([
            MaxPooling2D(input_shape = conv_layers[-1].output_shape[1:]),
            Flatten(),
            Dense(4096, activation = 'relu'),
            Dropout(p = 0.6),
            Dense(4096, activation = 'relu'),
            Dropout(p = 0.6),
            Dense(2, activation = 'softmax')
        ])
    
    for l2,l3 in zip(nmodel.layers, fc_layers): l2.set_weights(wghts(l3))

    nmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return nmodel

fc_model = get_fc_model()

I should add that my validation accuracy after this if I comment out for l2, l3... is awful - 50%, down from 85% with a test set I’m using of my own images for a wet vs. dry ID problem


(Vishnu Subramanian) #92

HI Elizabeth,

Let me first answer your question around the model part. If you take a look at utils.py in the github repo , vgg_ft model returns a model for which the last layer is already removed and a new dense layer with softmax activation function added. So you do not need to perform the pop and adding of a new layer part.

Regarding your second question around the weights , you can use the weights saved from the previous task where you could have fine tuned. In the original VGG model , a drop out of 0.5 is used . Which could be overkill for our problem . So jeremy in his class , try to avoid those drop outs . Instead of training the weights from some random values , we take the weights from the previous experiment and half it , as we are not using dropout . If you want to try with Dropout of 0.6 then you may not be able to take advantage of the previous weights. Increasing the dropout may not be a great idea ,since the training accuracy is lesser than validation accuracy. Basically the model is under fitting.

Hope it helps.

Thanks,
Vishnu Subramanian


(Amy) #93

Hmm… I am still not fully understanding the rationale behind halving weights when we want to load the convolutional outputs from Vgg to a new fully-connected model.

From my understanding…
In the original Vgg16, there are dropout layers set to 0.5 in the fully-connected blocks. That means out of 4096 outputs (from the Dense layer), 2048 weights are temporarily deactivated during training each time dropout is utilized.

To deal with having double the # of weights as we remove dropout in our newly constructed model, the notes Wiki mentions:

and therefore a valid method of alleviating this imbalance is to simply divide the dense Vgg in half when loading them into our new model

How does dividing each weight value by half exactly relate to having half of the weights zero’d out during training with dropout? Why do we do this instead of getting rid of half of the number of weights we’re loading in (since dropout gets rid of 50%), and zeroing out the rest?

Is this just a heuristic to initialize the weight closer to where they’re supposed to be before fitting?


(Elizabeth) #94

Hey Vishnu! Thanks for the quick reply.

Ignore the p values… was just playing around with those…

What’s weird is the TypeError I’m getting in regard to the weights parameter. I can’t figure out why it’s being classified as a generator.


(Vishnu Subramanian) #95

Please share your code in gist , so that we can take a look at it. Please denote at what point you are getting the error.


#96

Towards the end of the video for lesson 3 @jeremy mentions that there is a strategy that is nowadays used for adding drop out, but I think he mentions he described it earlier. I think I might have missed it or maybe it was shared during Part 1 via some other medium.

In general, what does this strategy entail? If I understood correctly, the idea is to start with no dropout and add as little as needed to the point where we are no longer overfitting? If I am reading this right, on the 5 point list that @jeremy shared with us in notebook for lesson 3, adding regularization is #4, so this is something that should be tried as a last resort to some extent? (dropout would be considered a form of regularization?)


#97

I’m applying the approach of the mnist notebook to the kaggle leaf classification (just the images, not the features). I notice that when i add batch normalization, it runs a lot slower (about 7 times longer per epoch) while having worse results. What could cause that? Is the dataset too simple?


(George) #98

Re large images and foveation. @jeremy (love the course, though I’m following it after the real-time classes.)

I liked your comment lecture 3 about how disappointing it is that people just down-sample big images and that something to do with the way eyes work in scanning pictures for relevance is likely to be the right way to go.

I have a suspicion that the real trick would be to break images down like foveation, but structured in one or more rooted trees. I’m basing this idea on a new paper by Lin and Tegmark about why these neural networks work so well (https://arxiv.org/pdf/1608.08225v2.pdf). They argue that things in Nature arise from generative processes controlled by only a handful of parameters, which means things that learn the original model can more easily learn the next level of (perhaps physical or evolutionary) generations.

I suspect that most large images we look at include natural objects that were generated in multiple steps of a generative model (embryology in living things; geologic and physical processes in non-living objects).

If the eye searches for something like rooted trees and then processes the leaves in smaller, related batches,then I would suspect big image understanding might benefit by doing something similar.


(Amy) #99

For anyone else that may have had the same question, I think I found an explanation in the lesson 3 note in the wiki, in which a clarification is made between classical dropout (which is the method of halving the weights demonstrated in the lesson 3 ipynb), and how Keras handles dropout (which renders the rescaling of weights demonstrated in the notebook actually not necessary).

With respect to my question of “how does dividing each weight value by half relate to dropout with a probability of 0.5”, the gist of it is that (from lesson 3 notes):

if during train time we were teaching our network to predict in the subsequent layer utilizing only 50% of its weights, now that it has all the weights at its disposal the contribution of each weight needs to only be half as big!

I must say I’m still not 100% on the exact mechanism of rescaling to compensate for dropout, but I’ll accept this intuitive explanation for now!


#100

Have you figured out how to write ensamble Mnist model to csv, so that it can be put on Kaggle. I tried savetext wit delimeter and it returned error