After finished the lesson7, I’m so impressed by FCN, it can create a heat map to tell us the where the model are trying to find fishes in the picture. In another section, we use bounding box to tell the model where it should find fishes. So I come up an idea, what if combine the FCN and Bounding Box?
So I create an architecture, I use the VGG640 to precompute the features, and make a FCN model like this:
def get_bb_model():
inp = Input(conv_layers[-1].output_shape[1:])
x = BatchNormalization(axis=1)(inp)
x = Convolution2D(nf, 3, 3, activation='relu', border_mode='same')(x)
x = BatchNormalization(axis=1)(x)
x = Convolution2D(nf, 3, 3, activation='relu', border_mode='same')(x)
x = BatchNormalization(axis=1)(x)
x = Convolution2D(nf, 3, 3, activation='relu', border_mode='same')(x)
x = BatchNormalization(axis=1)(x)
x = Convolution2D(8, 3, 3, border_mode='same')(x)
x_bb = Flatten()(x)
x_bb = Dense(4, name='bb')(x_bb)
x_class = GlobalAveragePooling2D()(x)
x_class = Activation('softmax')(x_class)
return Model(inp, [x_bb, x_class])
Hit the model.summary, we can see the architecture clearly:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_3 (InputLayer) (None, 512, 22, 40) 0
____________________________________________________________________________________________________
batchnormalization_9 (BatchNormal(None, 512, 22, 40) 1024 input_3[0][0]
____________________________________________________________________________________________________
convolution2d_22 (Convolution2D) (None, 128, 22, 40) 589952 batchnormalization_9[0][0]
____________________________________________________________________________________________________
batchnormalization_10 (BatchNorma(None, 128, 22, 40) 256 convolution2d_22[0][0]
____________________________________________________________________________________________________
convolution2d_23 (Convolution2D) (None, 128, 22, 40) 147584 batchnormalization_10[0][0]
____________________________________________________________________________________________________
batchnormalization_11 (BatchNorma(None, 128, 22, 40) 256 convolution2d_23[0][0]
____________________________________________________________________________________________________
convolution2d_24 (Convolution2D) (None, 128, 22, 40) 147584 batchnormalization_11[0][0]
____________________________________________________________________________________________________
batchnormalization_12 (BatchNorma(None, 128, 22, 40) 256 convolution2d_24[0][0]
____________________________________________________________________________________________________
convolution2d_25 (Convolution2D) (None, 8, 22, 40) 9224 batchnormalization_12[0][0]
____________________________________________________________________________________________________
flatten_3 (Flatten) (None, 7040) 0 convolution2d_25[0][0]
____________________________________________________________________________________________________
globalaveragepooling2d_3 (GlobalA(None, 8) 0 convolution2d_25[0][0]
____________________________________________________________________________________________________
bb (Dense) (None, 4) 28164 flatten_3[0][0]
____________________________________________________________________________________________________
activation_3 (Activation) (None, 8) 0 globalaveragepooling2d_3[0][0]
====================================================================================================
Total params: 924300
____________________________________________________________________________________________________
After running 10 epochs, I got a pretty good result:
Train on 3277 samples, validate on 500 samples
Epoch 1/1
3277/3277 [==============================] - 25s - loss: 0.2828 - bb_loss: 258.3640 -
activation_3_loss: 0.0244 - bb_acc: 0.8419 - activation_3_acc: 0.9997 - val_loss: 1.3981 -
val_bb_loss: 1274.4607 - val_activation_3_loss: 0.1236 - val_bb_acc: 0.7800 - val_activation_3_acc: 0.9720
I got the result of 97.2% val_acc and 0.12 val_loss!
Visualize the result of one sample:
In my intution, the model has trained a good feature in the last convolution layer, which outputs 8 results, and it can generate a heat map which tells us it was finding fishes in these pink area. In other words, the model can focus on the pink area, if we tell it there’s a fish in some particular pink area with my bounding box, the model will keep ajusting it’s vision to find the fishes through SGD.
That’s a perfect cooperation between human and AI !
If you have some better idea, I hope you can share with us and improve the result together!
PS: you can download my ipynb from here.