Building a similar model to Google's teachable machine


I would like to build a cnn model that works with a similar reliability as Google’s teachable machine:

It suffices to provide a few images (25 per category), for the classifier to have a high reliability. Apparently it uses transfer learning. So I trained a simple CNN model based on resnet.

cnn_learner(data, models.resnet18, pretrained=True, metrics=accuracy)

After 20 epochs I get the following results:
train_loss valid_loss accuracy
0.098954 0.222413 0.954545

When applying it to a moving webcam however, the classification can be quite sensitive to small changes. Google’s web demo is much more robust (and needs barely any training).

What can I do to get a similar performance for a classifier of just a couple classes? Is there some example architecture (using fastai) similar to the code below, to get started?

Quote of the essential parts, that describe the approach:

This network is trained to recognize all sorts of classes from the imagenet dataset. Instead of reading the prediction values from the MobileNet network, we instead take the second to last layer in the neural network and feed it into a KNN (k-nearest neighbors) classifier that allows you to train your own classes.

The benefit of using the MobileNet model instead of feeding the pixel values directly into the KNN classifier is that we use the high level abstractions that the neural network has learned in order to recognize the Imagenet classes. This allows us with very few samples to train a classifier that can recognize things like smiles vs frown, or small movements in your body.