change it to python 2. My guess is keras has a problem with python 3.
I tried with python 2. It is right for a moment then it come back just like python 3. It drives me crazy. I don’t know if they store model as cache or not but the different run fit_generator my loss is from 0.3 jump to 8.0.
I seem to have a similar issue with the cats&dogs VGG16 fine-tuning…
I use python3 with keras2 and tensorflow. I use the keras VGG model, which is built with the functional API instead of the Sequential.
First I instantiate the full VGG and replaced the final layer with a Dense(2). Training worked fine. But when I tried to re-train all dense layers the accuracy collapsed to 50%.
I thought that there is something, which I do not understand, and that prevents re-training of provided layers. So I instantiated only the conv part, and added the full dense part. But again re-training the last dense layer works, but re-training e.g. the last two dense layers lets the accuracy collapse.
I already tried small/big learning rates, and the Adam optimizer. I’m completely out of ideas.
Here my final code. Change a singe character and it works:
[:-2] -> [:-1] for setting the final layers learnable
%matplotlib inline import os from importlib import reload import utils3; reload(utils3) from utils3 import * import tensorflow as tf import keras.backend.tensorflow_backend as ktf def set_session_options(**kwargs): gpu_options = tf.GPUOptions(**kwargs) ktf.set_session(tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))) set_session_options(allow_growth=True) path = "data/dogscats/" model_path = path + 'models/' if not os.path.exists(model_path): os.mkdir(model_path) batch_size=64 val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size) batches = get_batches(path+'train', shuffle=True, batch_size=batch_size) val_classes = val_batches.classes trn_classes = batches.classes val_labels = onehot(val_classes) trn_labels = onehot(trn_classes) opt = RMSprop(lr=0.1) def fit_model(model, batches, val_batches, nb_epoch=1): model.fit_generator(batches, steps_per_epoch=ceil(batches.n / batches.batch_size), epochs=nb_epoch, validation_data=val_batches, validation_steps=ceil(val_batches.n / val_batches.batch_size)) opt = RMSprop(lr=0.1) #opt = Adam(lr=0.00001) from keras.applications.vgg16 import VGG16 model = VGG16(include_top=False, input_shape=(224, 224, 3)) flatten = Flatten() x = flatten(model.output) x = Dense(4096, activation='relu')(x) x = Dense(4096, activation='relu')(x) x = Dense(2, activation='softmax')(x) model = Model(inputs=model.input, outputs=x) for layer in model.layers: layer.trainable=False for layer in model.layers[-2:]: layer.trainable=True model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) model.load_weights(model_path+'finetune1.h5') fit_model(model, batches, val_batches, 2)
OK, I found the solution, at least in my case.
First you need to compile after setting trainable. I think I read the opposite in one of Jeremy’s notebooks. But maybe it was different with keras1.
Second this well trained model is still quite sensitive. I had to include the dropouts, AND reduce the learning rate. For Adam 1e-4 for training 2 dense layers, and 1e-5 for 3 dense layers.
That’s it. Hope it helps you too.
Hi, follow up question. is the position of the dropout important? Like would the dropout come between the two dense or after the dense and before the softmax?