Upon inspecting the pre-trained network provided in platform.ai/models vs. the one available in keras, the two networks appear to have different architectures. For example, VGG16 provided by Jeremy has dropout layers in between the dense layers, whereas the VGG16 network in keras.applications does not have dropout layers.
Is there any reason for this difference? I find that Jeremyâs network works better for me - did he add the dropout layers himself? For those students who also want to view what Iâm talking about - you can see the difference in architectures yourself by doing this:
To view Jeremyâs vgg16 architecture:
from vgg16 import Vgg16
vgg = Vgg16()
vgg.model.summary()
To view the keras vgg16 architecture:
from keras.applications.vgg16 import VGG16
keras_vgg16 = VGG16()
keras_vgg16.summary()
Another question is, where did Jeremyâs VGG16 pre-trained network come from? Did he pre-train himself or did he grab it from a resource somewhere? Is there a repo of pre-trained networks that someone recommends? Thanks
I grabbed the weights from https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3 and made some minor changes. The original VGG network used dropout - Iâm not sure why keras removed it. In the course I show how to change the amount of dropout (or remove it) as needed for each application.
import numpy as np
from keras.preprocessing import image
vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))
# I tried using this and not using this in the image loading, makes no difference
def vgg_preprocess(x):
x = x - vgg_mean
return x
# return x[:, ::-1] # reverse axis rgb->bgr
import vgg16 as jeremy_vgg16
BATCH_SIZE = 64
DATA_PATH = "data/dogscats/sample/"
batches = jeremy_vgg16.Vgg16().get_batches(DATA_PATH+'train',
gen=image.ImageDataGenerator(preprocessing_function=vgg_preprocess),
batch_size=BATCH_SIZE)
val_batches = jeremy_vgg16.Vgg16().get_batches(DATA_PATH+'valid',
gen=image.ImageDataGenerator(preprocessing_function=vgg_preprocess),
batch_size=BATCH_SIZE)
from keras.applications import vgg16 as keras_vgg16
from keras.models import Model
from keras.layers import Dense, Flatten, Input
from keras.optimizers import SGD, RMSprop, Adam
input_layer = Input(shape=(3, 224, 224),
name='image_input')
base_model = keras_vgg16.VGG16(weights='imagenet', include_top=False)
x = base_model(input_layer)
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
predictions = Dense(2, activation='softmax', name='predictions')(x)
# this is the model we will train
keras_vgg = Model(input=input_layer, output=predictions)
# freeze all convolutional Vgg16 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model
keras_vgg.compile(optimizer=RMSprop(), loss='categorical_crossentropy', metrics=['accuracy'])
keras_vgg.fit_generator(batches,
validation_data=val_batches,
samples_per_epoch=batches.nb_sample,
nb_val_samples=val_batches.nb_sample,
nb_epoch=10)
@achiang The problem is with two sequentially connected ReLU fc1 and fc2 layers. If you change one of them to other activation, e.g. softmax. Your code will work properly.
@beniamin
Thanks a bunch! It worked when I changed the first layer fc1 to softmax. (Curiously, not the other way around.) Do you know of a specific reason why two relu layers in a row donât work?
I am curious about this as well⌠As @achiang mentioned, the fitting progress improves drastically when the activation of the first dense layer is changed to softmax and also when I used just one softmax layer instead of the two ReLU (results below).
Also, it seems @jeremy 's code from lesson 2 (relevant section from vgg16.py copied below) has two fully connected ReLU layers, albeit with 0.5 dropout after each, and seems to work fine.
@beniamin is there something I am missing that allowed you to identify the two dense ReLU layers back-to-back as problematic? Perhaps something to do with the magnitude of inputs from the convolutional layers being put into an unbounded function before being regularized/normalized by dropout/softmax?
Thanks alot!
Results with just 1 softmax layer between convolution layers/flattening and the predictions, lr=.05:
@achiang, I see a couple of odd details in your output:
40 samples is definitely not enough to train a CNN in most cases
Your results stay the same after the 1st epoch. This makes me assume that you have 20 samples per class and your model is simply guessing a single class for each epoch. If this is the case, it makes sense that it has an accuracy of 0.5000
What I reccomend:
Get more data/samples
look at your modelâs predictions to confirm the second detail
Itâs recommended that you use softmax strictly in the output layer as it is only supposed to return an âarrayâ of probabilities per class. My intuition tells me that the model would simply adjust any hidden softmax layers to have no effect, i.e. for 2 classes, output a likelihood of 0.5 for each one.
How many samples are you training on?
Could you print both the results and the modelâs architecture/layers and share this with us?
Hmm? I think @achiang and I shared youâre intuition initially but found that ReLU hidden layers were incapable of fitting anything in this, albeit small, example but softmax hidden layers took fit right away. It seems it was obvious to @beniamin that softmax was needed and two ReLU layers would fail as a hidden layers but I donât know why.
We are both using the same small # of samples and architecture/layers per the code @achiang posted, with the hidden layers modified per the discussion.
The sample set is small but it has both classes. I think the small sample set doesnât matter since the CNN layers are frozen.
Actually I tried it with the full-sized train/ validation dogscats data provided in Week1 but there was still no progress in training. And as @vinvinvin pointed out w/ the softmax hidden layer it works with the sample dataset with no problems.
Thanks to this amazing course and the Deep Learning with Python e-book from Francois Chollet himself, spotted and twitted by our own @EricPB a couple of days ago, I am almost there with building my own nets in Keras. My problem atm is becoming able to code the pre-cooked architectures effectively with different datasets (eg. Kaggle Planet competition) and different parametric optimisators, for example hyperas. As for VGG16 and Jeremyâs Lesson 2 Part 1, I am still trying to understand how I can do with my 16GB RAM memory, my local Python kernel just shuts down halfway atm.
When I try to build a new model based on vgg16 on my mac, I got the same problem too. Whatever I change my model , it seems got 0.5 accuracy for ever .
At last , I find some thing, and fix it, here is some tips of my solution.( I use vgg16 model buildin keras 2 not same as Jeremy used )
Do not use augmentation on valid set.
We should always be careful of this when using ImageDataGenerator both on training set and validation set .
When I do assignment of lesson 2 on my mac , I just use 40 example for each training and validation set, itâs pretty small for such a large model .
And I do augmentation both on training and validation set , this make my accuracy even less than 0.5.
Using dropout and try different architectures.
I tried many different architectures when I doing this.
The architecture below based on a bottleneck output with keras vgg16 model (without last 3 FC layers).
It maybe not the best solution, but good enough for the 40 training examples. Also, itâs much smaller than the vgg16 FC layers.
Try different learning rate, and start with a small value.
I started training with lr=0.0001, not the default 0.01, and try many different values.
This is just a experiment on small data set, but enough before we training the full data set on the GPU environment. At least , we found a way seemed probably right for training a new model to solve the classification problem .
The code below worked for me using Keras 2. Be sure to set include_top=True in the VGG Keras download.
Instead of popping off the original 1,000 category prediction layer from VGG, just connect your new prediction layer to the last fc layer in VGG:
# retrieve the full Keras VGG model including imagenet weights
vgg = VGG16(include_top=True, weights='imagenet',
input_tensor=None, input_shape=(224,224,3), pooling=None)
# set to non-trainable
for layer in vgg.layers: layer.trainable=False
# define a new output layer to connect with the last fc layer in vgg
# thanks to joelthchao https://github.com/fchollet/keras/issues/2371
x = vgg.layers[-2].output
output_layer = Dense(2, activation='softmax', name='predictions')(x)
# combine the original VGG model with the new output layer
vgg2 = Model(inputs=vgg.input, outputs=output_layer)
# compile the new model
vgg2.compile(optimizer=Adam(lr=0.001),
loss='categorical_crossentropy', metrics=['accuracy'])
# run it!
vgg2.fit_generator(batches,
steps_per_epoch = batches.samples // batch_size,
validation_data = val_batches,
validation_steps = val_batches.samples // batch_size,
epochs = 1)