Model Distillation as a way to quickly train models on low powered devices?

When playing around with different things, I think I’ve come across a way to quickly train a somewhat reliable image classifier (for just a few classes) without having to train a neural network which may be useful if creating an app that requires you to train models on the fly without a GPU. Here are the steps:

Suppose X is your set of images and Y is your set of corresponding labels.

Normally, you’d use a neural network and run fit(X,Y) on it.

You can also try fitting X to Y using a Random Forest model as well but your results will be pretty bad.

Now suppose you have a pretrained Imagenet Model, let’s call it M which takes an image and returns a 1000-dimensional probability vector which is the probability the object in the picture is that particular class for each of the 1000 classes.

Now how about this: instead of fitting X to Y using a Random Forest, you fit M(X) to Y using Random Forests? Basically you “preprocess” each image in X by having the Imagenet make a prediction and convert it to a 1000-D vector then fit those probability vectors in your Random Forest model?

I tried this with the Dogs and Cats dataset: simply converting each image to a 224x224 image then flattening them out before stacking them together and training the Random Forest on that resulted in an accuracy of maybe 55% (barely above random guessing.)

But when I converted each 224x224 image to a 1000-D probability vector first using the help of Resnet34 then training the Random Forest on the matrix of stacked probability vectors, I got an accuracy of about 92%. This, of course, isn’t state of the art in accuracy but it took a fraction of the time it took to attack this problem using a full-blown neural network.

I’m currently trying to build an app that gives picture recommendations to users, and this might come in handy if I don’t need completely accurate recommendations but just better than random guessing. (And keep in mind I’m building a content-based recommendation system rather than one based off of how others rated the images so collaborative filtering is out of the question. Think of this as the “Pandora” for images where the system makes new recommendations based off of the content of the photos previously rated rather than how others rated the photos.)

Is this, in essence, the same as Distillation?

Also, is there any way I can do the same with Ulmfit? In other words, can I download the Ulmfit model and, without training, it, directly pass text to it and get a probability vector output in the same way that I can download the pretrained Resnet34 (or some other Imagenet model) and, without training it, pass it an image and get a probability vector for 1000 classes?

1 Like

“Distilling the knowledge in neutral network” paper by Hinton el al exactly takes this approach. You may have a look