Hey @rforgione, my understanding is that you have 2 choices in using a CNN to classify an image:
Take an existing pre-trained CNN such as VGG and chop off the softmax layer to get at one of the dense layers that follow the convolutional layers. Now when you call
predict on a new image, you get the activations.
You could (not necessary but as a thought experiment) save these to a CSV and view them as a replacement for your images. Instead of images, you now have activations. You still need to do your final task which is to classify your image and you can build a classifier however you like e.g. feed the activations into a logistic regression or a random forest.
Alternatively, you could use Keras to fit a vanilla neural network (MLP, not convolutional) to fit a neural network on top of your activations that ends in a softmax layer and use that as your final classifier.
Use a pre-trained CNN like in choice 1 but don’t chop-off any dense layers except the final softmax layer which you’ll replace with a layer for your problem so that it has the correct number of output classes (like cats & dogs where you replace the final layer that predicts the 1000 classes with a layer that predicts just 2 classes).
trainable=false for the convolutional layers and just re-train the dense layers on your image data.
Important note on compositionality
Notice that if you use Keras to add the same number of dense layers and softmax layers as you chopped off then that’s the same as if you never chopped them off in the first place.
Chopping off the dense layers and then adding them back doesn’t change anything (assuming the convolutional layers are not trainable) - the convolutional part that gives you the activations and the dense layers you add on top of those activations are compositional.
Say, for example, your images are of furniture and you know for each image the following 3 different attributes:
- furniture type (chair / sofa / table)
- furniture material (wood / leather / fabric)
- furniture colour (black / brown / red)
Now if you go with choice 2 to classify a new image for each of the 3 attributes, you will have to recompute the convolutional layers for each classifier.
If instead, you go with choice 1, you only have to compute the convolutional outputs once and then fit 3 different classifiers on the same activations for each of your 3 attributes you’re trying to predict.
For more detail, see the section under the heading “Aside: Pre-calculating Convolutional Layer Output” from the lesson 3 notes here:
In both of the cases above, I’m assuming you don’t have enough data to train the convolutional layers from scratch but if you do, the differences between choice 1 and choice 2 remain the same except you’d want to train the convolutional layer on a single attribute e.g. furniture type (or possibly train on all 3 attributes in sequence, chopping off the final layer and training the convolutions some more with different final layers being predicted)
Hope that helps!