Thanks @VishnuSubramanian. I do think that I’ve done my homework and have read these before I came here and asked the question. I’m aware that vgg_bn.py exists and I could use that for the same purpose. But in this case, I’m just trying to learn how Jeremy did it with the original vgg.py only. His dogscats ensemble notebook didn’t mention vgg_bn.py.
In other words, my question is simple. Would you let me know why, if I run the original dogscats ensemble notebook, I don’t get the same results as he does? In particular, I get about 50% accuracy after running 13 epochs, instead of the above 97% when he runs it.
The code I used are as follows, same as his notebook, I believe. For ease of reading, I’ve downloaded it from ipynb into .py:
# coding: utf-8
# In[50]:
from theano.sandbox import cuda
cuda.use('gpu0')
# In[51]:
get_ipython().magic(u'matplotlib inline')
import utils; reload(utils)
from utils import *
from __future__ import division, print_function
# ## Setup
# In[52]:
path = "data/dogscats/sample/"
model_path = 'data/dogscats/models/'
if not os.path.exists(model_path): os.mkdir(model_path)
batch_size=64
# In[53]:
batches = get_batches(path+'train', shuffle=False, batch_size=batch_size)
val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size)
# In[54]:
(val_classes, trn_classes, val_labels, trn_labels,
val_filenames, filenames, test_filenames) = get_classes(path)
# In this notebook we're going to create an ensemble of models and use their average as our predictions. For each ensemble, we're going to follow our usual fine-tuning steps:
#
# 1) Create a model that retrains just the last layer
# 2) Add this to a model containing all VGG layers except the last layer
# 3) Fine-tune just the dense layers of this model (pre-computing the convolutional layers)
# 4) Add data augmentation, fine-tuning the dense layers without pre-computation.
#
# So first, we need to create our VGG model and pre-compute the output of the conv layers:
# In[55]:
model = Vgg16().model
conv_layers,fc_layers = split_at(model, Convolution2D)
# In[56]:
conv_model = Sequential(conv_layers)
# In[57]:
val_features = conv_model.predict_generator(val_batches, val_batches.nb_sample)
trn_features = conv_model.predict_generator(batches, batches.nb_sample)
# In[58]:
save_array(model_path + 'train_convlayer_features.bc', trn_features)
save_array(model_path + 'valid_convlayer_features.bc', val_features)
# In the future we can just load these precomputed features:
# In[59]:
trn_features = load_array(model_path+'train_convlayer_features.bc')
val_features = load_array(model_path+'valid_convlayer_features.bc')
# We can also save some time by pre-computing the training and validation arrays with the image decoding and resizing already done:
# In[60]:
trn = get_data(path+'train')
val = get_data(path+'valid')
# In[61]:
save_array(model_path+'train_data.bc', trn)
save_array(model_path+'valid_data.bc', val)
# In the future we can just load these resized images:
# In[62]:
trn = load_array(model_path+'train_data.bc')
val = load_array(model_path+'valid_data.bc')
# Finally, we can precompute the output of all but the last dropout and dense layers, for creating the first stage of the model:
# In[63]:
model.pop()
model.pop()
# In[64]:
ll_val_feat = model.predict_generator(val_batches, val_batches.nb_sample)
ll_feat = model.predict_generator(batches, batches.nb_sample)
# In[65]:
save_array(model_path + 'train_ll_feat.bc', ll_feat)
save_array(model_path + 'valid_ll_feat.bc', ll_val_feat)
# In[66]:
ll_feat = load_array(model_path+ 'train_ll_feat.bc')
ll_val_feat = load_array(model_path + 'valid_ll_feat.bc')
# ...and let's also grab the test data, for when we need to submit:
# In[67]:
test = get_data(path+'test')
save_array(model_path+'test_data.bc', test)
# In[68]:
test = load_array(model_path+'test_data.bc')
# ## Last layer
# The functions automate creating a model that trains the last layer from scratch, and then adds those new layers on to the main model.
# In[71]:
def get_ll_layers():
return [
BatchNormalization(input_shape=(4096,)),
Dropout(0.5),
Dense(2, activation='softmax')
]
# In[83]:
def train_last_layer(i):
#so the above a few cells ago,
#we dropped the last two layers of vgg, and got the model output just above as features
#here we train the replacement layers (adding batchnorm, and modify the number of outputs)
#the purpose is to finetune the last dense layer, with batchnorm
ll_layers = get_ll_layers()
ll_model = Sequential(ll_layers)
ll_model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
ll_model.optimizer.lr=1e-5
ll_model.fit(ll_feat, trn_labels, validation_data=(ll_val_feat, val_labels), nb_epoch=12)
ll_model.optimizer.lr=1e-7
ll_model.fit(ll_feat, trn_labels, validation_data=(ll_val_feat, val_labels), nb_epoch=12)
ll_model.save_weights(model_path+'ll_bn' + i + '.h5')
#here we've taken vgg 16, popped the last dense layer, the dropout above that
#and the regular dense layer above that
vgg = Vgg16()
model = vgg.model
model.pop(); model.pop(); model.pop()
print (model.summary())
for layer in model.layers: layer.trainable=False
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
ll_layers = get_ll_layers()
for layer in ll_layers: model.add(layer)
for l1,l2 in zip(ll_model.layers, model.layers[-3:]):
l2.set_weights(l1.get_weights())
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.save_weights(model_path+'bn' + i + '.h5')
return model
# In[84]:
train_last_layer('test')
In addition, I’d also like to ask why model.pop() was done three times in the definition of “train_last_layer(i)”. I thought popping twice is enough, and the precalculation was done based on popping the last two, instead of three, layers. That’s not the problem why the accuracy is so low though. I changed the definition to popping twice only before, and the result was not improved.