Lesson 1/2: Am I Getting this Right?

tastingsilver · October 28, 2017, 5:24am

Hey everyone,

Two lessons deep and really enjoying the course so far. Trying to understand one of the concepts regarding the application of the VGG model to items outside of its pre-trained scope.

Would the following notion be correct: the VGG model’s pre-trained weights and layers can be re-applied to any image identification task by “fine tuning” the final network due to its pre-trained perception of image recognition patterns. (?)

Additionally, I’d like to jump straight in to applying this to a real world image recognition problem I have. I have a clean set of imagery associated with China and Crystal plates where the patterns are generally well defined, and a few testing/training samples for each. If I use the Keras ImageDataGenerator class to transform the associated data just a bit, how many samples of each do I truly need to make this accurate?

Looking forward to having some fun with this.

machinethink · October 28, 2017, 2:20pm

VGG has been trained on a wide variety of objects (the ImageNet dataset of 1.3 million images in 1000 different categories). So it has a certain knowledge of what certain objects look like, i.e. it has learned certain image recognition patterns.

You can re-use this knowledge by fine-tuning VGG on your own data. However, if your data is very different than the data VGG was trained on, this might not work very well. For example, VGG was not trained on microscopic images, so trying to fine-tune it on pictures of bacteria etc probably won’t work very well.

For your use case it will probably work OK. However, the more training images you have, the better. Data augmentation will help but only gets you so far. Some people use synthetic images to artificially enlarge their dataset.

I would suggest first trying it out with just the images you have, to see what sort of accuracy you’re getting.

tastingsilver · October 28, 2017, 3:17pm

Thanks. As my area of focus is really centered on the designs within a given interface (such as a plate), would preprocessing such as extracting the area of the circle from the background likely be beneficial?