Predicting risk with only positive examples

I’m curious if anyone has done work related to this.

I have a tabular dataset of worksite injuries, that includes some background information (company/datetime/location/weather/etc). However, this data only includes positive examples (occurrences of incidents). It would be interesting to try and predict the probability of imminent injury with future non-incident data.

I was considering possibly trying an autoencoder and measuring reconstruction error. Is anyone aware of approaches for this category of modeling problems?

What you’ve just describedis called Anomaly Detection in ML. There are many approaches to solve it including a KNN and the method you just described. You can get to know more about using ANN for Anomaly detection here: https://arxiv.org/abs/1802.06360.

Andrew Ng also covers Anomaly detection in his Coursera Machine Learning Course though it does not use a Neural Network.

2 Likes

AFAIK you need negative examples otherwise you cannot draw the boundary we want the algorithm to learn.You can use anomaly detection as @Lord_jayam suggests however even then you need to define an artificial boundary yourself (i.e. 2 standard deviations from the mean) which may or may not hold in the real-world. You basically won’t know what works until you have negative examples.

2 Likes

I have used several algorithms of anomaly detection in scikit-learn using the bottle neck of resnet as features (for images). The result for me is not so good but you can give it a try for your case.
https://hackernoon.com/one-class-classification-for-images-with-deep-features-be890c43455d
Isolation forest give me the best result.

3 Likes