Predicting risk with only positive examples

(Austin) #1

I’m curious if anyone has done work related to this.

I have a tabular dataset of worksite injuries, that includes some background information (company/datetime/location/weather/etc). However, this data only includes positive examples (occurrences of incidents). It would be interesting to try and predict the probability of imminent injury with future non-incident data.

I was considering possibly trying an autoencoder and measuring reconstruction error. Is anyone aware of approaches for this category of modeling problems?


(Jayam Thaker) #2

What you’ve just describedis called Anomaly Detection in ML. There are many approaches to solve it including a KNN and the method you just described. You can get to know more about using ANN for Anomaly detection here:

Andrew Ng also covers Anomaly detection in his Coursera Machine Learning Course though it does not use a Neural Network.



AFAIK you need negative examples otherwise you cannot draw the boundary we want the algorithm to learn.You can use anomaly detection as @Lord_jayam suggests however even then you need to define an artificial boundary yourself (i.e. 2 standard deviations from the mean) which may or may not hold in the real-world. You basically won’t know what works until you have negative examples.


(Dien Hoa TRUONG) #4

I have used several algorithms of anomaly detection in scikit-learn using the bottle neck of resnet as features (for images). The result for me is not so good but you can give it a try for your case.
Isolation forest give me the best result.