Predicting risk with only positive examples

austinmw · July 3, 2019, 2:28pm

I’m curious if anyone has done work related to this.

I have a tabular dataset of worksite injuries, that includes some background information (company/datetime/location/weather/etc). However, this data only includes positive examples (occurrences of incidents). It would be interesting to try and predict the probability of imminent injury with future non-incident data.

I was considering possibly trying an autoencoder and measuring reconstruction error. Is anyone aware of approaches for this category of modeling problems?

Lord_jayam · July 3, 2019, 7:16pm

What you’ve just describedis called Anomaly Detection in ML. There are many approaches to solve it including a KNN and the method you just described. You can get to know more about using ANN for Anomaly detection here: https://arxiv.org/abs/1802.06360.

Andrew Ng also covers Anomaly detection in his Coursera Machine Learning Course though it does not use a Neural Network.

maral · July 5, 2019, 6:39am

AFAIK you need negative examples otherwise you cannot draw the boundary we want the algorithm to learn.You can use anomaly detection as @Lord_jayam suggests however even then you need to define an artificial boundary yourself (i.e. 2 standard deviations from the mean) which may or may not hold in the real-world. You basically won’t know what works until you have negative examples.

dhoa · July 5, 2019, 7:44am

I have used several algorithms of anomaly detection in scikit-learn using the bottle neck of resnet as features (for images). The result for me is not so good but you can give it a try for your case.
https://hackernoon.com/one-class-classification-for-images-with-deep-features-be890c43455d
Isolation forest give me the best result.