I’m about to launch my first machine learning application that predicts the results of blood, urine and stool testing using health assessment questionnaire data. My training dataset was around 100k features collected from about 1,000 athletes, and my doctors are very impressed with the sensitivity and specificity of the tests. Hopefully, we’ll publish something soon.
Developing this application taught me how powerful it could be to speak with an expert before choosing an algorithm! In class, @jeremy suggested a random forest, and in the end, I had fantastic results using XGBoost.
This Deep Learning course also taught me some powerful, reusable principles, and although I didn’t end up using deep learning, I did use almost everything else in Part 1.
My next application will hopefully predict arrhythmias in RR intervals collected from a heart rate monitor strap worn by users of the Elite HRV app. I’ve found a public dataset with arrhythmia annotations, and my input data will look like this:
0.789 N 0.817 N 0.653 A 0.994 N 0.844 N 0.811 N 0.789 N
The first column represents the RR interval (distance between heart beats) and the second column is the annotation (N is normal, A is atrial premature beat). Twelve possible output classes to predict.
My job is to predict the annotations from the stream of RR intervals output by the HR monitor strap.
Any suggestions for algorithms would be greatly appreciated! Thanks in advance!