Using an Image Classifier in a Web App and having it retrain on new images (that the user submits) periodically

xjdeng · February 28, 2019, 3:33am

Is this feature supported? If so, what’s the best practice for doing this?

Let’s say I’m building an app that helps the user pare down the images in his social media feeds. This app looks into his feed and he rates each image a “thumbs up” or “thumbs down”. Over time, the system learns about his preferences (by basically training a classifier for “thumbs up” and “thumbs down” just like cats and dogs) and only shows him the pictures he/she is probably gonna give a thumbs up.

Here’s the problem: the images needed to train such a classifier aren’t all available on day 1, by the fundamental nature of this problem. Let’s say that the user is asked to rate N images during his on-boarding before an initial model is trained. The next day, the classifier, having been trained, is now functional and can give the user some recommendations. This day, the user ends up rating M more images. Is there any way, using Fast.ai, to update the model to reflect those M images and their respective classes?

Obviously, at this point, you can train it on all N + M images, but that would take O(N + M) time. Is there a O(M) solution for updating the model at this point?

maral · February 28, 2019, 3:59am

It is possible however instead of training a classifier you could use a neural network to extract a feature vector for every image. You now have a set of feature vectors (A, B, …, Z) that defines your search space. Each feature vector has a label that is thumbs up or down. When a new image comes in you generate a feature vector and measure its distance to the nearest N feature vectors in the space. You can try different distance measures (cosine, euclidean, etc). You can then classify the new image thumbs up or down based on the majority (thumbs up or down) in the set of N. This approach avoids the need to reclassify images.

xjdeng · February 28, 2019, 3:26pm

Ok, that’s one way of doing it.

Correct me if I’m wrong but it seems that you need a Databunch with data already loaded in order to create a Cnn learner. And you can’t change the dataloader associated with that particular learner once it’s already created.

Let’s reframe this question : is there any way in fastai to train a learner on one particular set of data then switch it to a different set under the following conditions?

you don’t know what the second set of data is when you’re training the first set
you don’t reset your network when training tge second set: you continue training just with different images and their respective labels.

This is pretty trivial in Keras and I’m sure in Pytorch as well by doing something like:

Model.fit(x1, y1) 
#some time later:
Model.fit(x2, y2)

and I suppose I could implement it in either language but I’ll lose the fastai improvements in the process including but not limited to the cyclical learning rates, unfreezing layers, etc.