I got the food-101 dataset and split it into test/training datasets in the same folder arrangement as the dogs/cats dataset. The text in the notebook suggests that I can run the code as is, and it should make the right predictions as is. I let it train for ~10 minutes and it was getting a ~30% accuracy, so it appeared to be “working.”
Then I realized that the vgg python file is downloading some dogs/cats specific things.
How can I remove that dependency and allow for it to retrain and predict on my dataset without using the vgg.h5 weights and the weirdly formatted json file?
If I use the code at the end of the lesson, I think I can train the model that way, but without the JSON file I’ll have no way of actually seeing the results. What can I do here?
In Vgg16 create method, you can remove the following line.
model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir='models'))
Then the model will initialize to random weights and train the weights on your data
Hi @fyber, there might be a misunderstanding here: The architecture of models (i.e. the layers, convolutions, etc.) in keras are stored as JSON files, while the weights are stored as HDF5 (.h5), see here. The weights are what the model has learned. You need both to run VGG16.
The VGG net used here has been trained on Imagenet.org data. The 1000 classes include dogs and cats, but are by no means specific to just these two classes. Moreover, there is food in the Imagenet data, so VGG will be good at recognizing this as well.
The big take home, however, of the lessons on convolutional models is, that, even though a model, like VGG16, may not have been trained explicitely on your kind of data, it may still be used to classify it. This is called transfer learning and the adaptation of a model to a new task is finetuning, i.e. adding a few more layers on top specific for your classification problem, while leaving the VGG weights untouched (or training only a few layers).
The reason why this works is visualized very well in the papers bei Mat Zeiler and Yosinksi (see lesson 3 I believe). In short: Features learned by the model may be general enough (edges, gradients, patterns) to be applicable to your food dataset in spite of the pictures looking quite different. Your custom top layers merely pick the right mixture of those features which are relevant to detect different food classes.
Ah ok, I was misunderstanding how this worked. Training on the provided weight data produces much better results than attempting to train from scratch.
If you remember from the lessons he disabled training on earlier layers (layer.trainable=False). This is so the training (mostly CNN layers) that was done using ImageNet is still in place and unaltered, all we are doing is training the dense layers that sit on top of the pre-trained CNN layers.
Exactly. After the VGG layers have been constant and your custom layers on top have been trained for a few epochs, some people like to finetune the VGG layers with a low learning rate. This keras blog is very helpful here.
The underlying idea is that you can classify a new body of data much more quickly (fewer training epochs) and with an order of magnitude or two less training data when using this transfer learning approach vs training a new deep architecture from scratch.