Anomaly Detection

gohar · December 21, 2016, 12:36pm

I want to detect “anomalous” images from a series of images. The image may be a plot or a graph, and there might be some images that deviate significantly from other images. What techniques can I use to, perhaps, tell with some probability that an image is different from most other images seen thus far?

coderama · December 21, 2016, 7:19pm

Alright, I have been lingering in these forums for two months, so time to make my first post.

I have not dived in this area, so thinking out aloud. If all the images are generally of the same type (for e.g., one type of object or MRI scans of the brain):

(1) Train an auto-encoder, where the input is the same as an output.
https://en.wikipedia.org/wiki/Autoencoder
(2) Run all your images through the first part of this autoencoder and find the image vector representation (Middle layer of this autoencoder)
(4) Calculate the average distance of each image w.r.t all other images

The images with much higher distance than the usual will be anomalous.

I am sure there must be better ways. I see a few links on the subject (haven’t read them yet):
https://www.linkedin.com/pulse/using-deep-learning-anomaly-detection-radiological-images-pip-curtishttp://www.mathtube.org/lecture/video/deep-learning-image-anomaly-detection

jeremy · December 22, 2016, 2:15pm

@coderama’s first post is an excellent one - thanks for de-cloaking!

@gohar it’s better to use a “real” loss function instead of an autoencoder if possible, since that way you can be sure that the features that are being used to detect anomalies are the ones you care about. Are there any labels that you could use?

You may also be interested in looking at triplet and siamese networks: https://arxiv.org/abs/1412.6622

Are you able to share your dataset? I’m interested in teaching anomaly detection in part 2, so perhaps we could use your problem as a real world use case?

SrinivasVishal7 · February 23, 2018, 11:17am

Its actually simple, Train a OneClassSVM model or LSAnomaly model with VGG16 or RESNET feature vectors, for prediction do the same. Take the Feature vectors of the image to be in prediction and give it to OneClassSVM

skbisoi · August 13, 2018, 5:46pm

Hi Jeremy…

I am still in part-1 and my apology if i am still not eligible to ask question in part-2 related queries.

I have some project requirement related to OIL and gas industry where need to find anomaly in the live images or still images for any leak or spillage in pipe or equipment.

But unfortunately i don’t have data.
So shall i use supervised CNN or any un supervised process ?
What dataset to use for modelling as COCO Data set is for normal objects.

Added for some video link of MS build 2018 conf for requirement clarification

Same case for video analytics where from video we need to know it is oil leakage or Gas leakage.

How to start and best model and data sets for above problems.

itaishch · January 4, 2019, 1:51pm

Hey everyone, had some experience with this problem so it looks like a good place for my 1st post - I’ve had a similar case in work where we basically tried to determine if the image is clean enough for our classification method (acting as a safety filter). Our specific case required the object to be completely visible.
Because of the diversity of noise the images contained (photo was taken underwater, so there were blobs, some distortion, discolored parts and more), we’ve initially taken an unsupervised one-class approach, trying different methods like VAE and Efficient-GAN. It was my first time in the unsupervised domain (besides some pretty simple works in the university with GANs), so take that under consideration, but TL;DR - that didn’t work. The methods were able to distinguish our one-class from inherently different images, but weren’t sensitive enough to notice subtle differences o. We also found some evidence from several resources showing this problem still has some way to go for complex images, e.g. the excerpt from this under-review article

We ended up utilizing histograms to capture the “big” noises + supervised classification with heavy augmentations to do a decent, not perfect, job on the smaller noises which we deemed good enough.