Arrhythmia detection

chris · June 9, 2017, 12:37am

I’m about to launch my first machine learning application that predicts the results of blood, urine and stool testing using health assessment questionnaire data. My training dataset was around 100k features collected from about 1,000 athletes, and my doctors are very impressed with the sensitivity and specificity of the tests. Hopefully, we’ll publish something soon.

Developing this application taught me how powerful it could be to speak with an expert before choosing an algorithm! In class, @jeremy suggested a random forest, and in the end, I had fantastic results using XGBoost.

This Deep Learning course also taught me some powerful, reusable principles, and although I didn’t end up using deep learning, I did use almost everything else in Part 1.

My next application will hopefully predict arrhythmias in RR intervals collected from a heart rate monitor strap worn by users of the Elite HRV app. I’ve found a public dataset with arrhythmia annotations, and my input data will look like this:

0.789	N
0.817	N
0.653	A
0.994	N
0.844	N
0.811	N
0.789	N

The first column represents the RR interval (distance between heart beats) and the second column is the annotation (N is normal, A is atrial premature beat). Twelve possible output classes to predict.

My job is to predict the annotations from the stream of RR intervals output by the HR monitor strap.

Any suggestions for algorithms would be greatly appreciated! Thanks in advance!

jeremy · June 12, 2017, 5:28pm

Thanks for the update! Please do keep us updated with how things go.

burgalon · June 15, 2017, 6:57pm

Hi Chris,

Your app for predicting results of blood+ sounds amazing. Could you share how did you manage to collect such a big amount of data? Did you use any automation tools for ingesting blood results scans?

The arrhythmia prediction sounds most interesting.

chris · June 16, 2017, 4:41pm

Hi Alon, I admit collecting the bood+ data was not easy. I quit my day job as a programmer at a hedge fund and started an online clinic with a local medical doctor. Later we were joined by another doctor and research scientist. Over a three year period, we ran lots of tests on athletes and I parsed that data from PDF documents using BeautifulSoup and loaded everything into a relational database. I then exported all the data into Pandas and the magic started!

Our interventions are mostly diet and lifestyle based and the test results play an important role in enabling behaviour change. It’ll be interesting to see if future clients still want to do all the testing or if they’d rather just save the time and money and go with what my models predicted. If they do choose to skip the real tests, it’ll be interesting to see if we get the same behaviour change and the same results. I’m not sure exactly how we’d randomise, but a clinical trial may be on the horizon.

joshgel · June 17, 2017, 10:58pm

This sounds very exciting. I’d consider using more than just the RR interval in that database, since there is so much data there. Some arrhythmia will have regular RR intervals and some will be what we call “irregularly irregular”. Detecting both types is important. This might be an interesting problem for a RNN since there is such repetition in the data. I’d be happy to chat more about what other things you could include if you want to talk more.

Josh
a physician learning machine-learning

chris · June 18, 2017, 1:41am

Hi, Josh, the problem is the commercially available heart rate monitor straps only collect RR intervals. One of the biggest barriers to getting users to monitor their HRV is the chest strap itself and I think the future is something like the ŌURA ring or maybe a simple finger trap.

Yes! I’ve been going back through the part 1 lessons on RNNs and thinking about how they might be applicable. Researcher Peter Backx pointed out on my podcast that afib is a very strong predictor of afib so it would make sense for the model to have state.

Let’s talk!

shushi2000 · June 18, 2017, 3:33am

Hi Chris, I collected some heart rate data from taxi drivers in one of my previous research projects. Not sure if it is relevant to your study but here is the publication:

I am curious about what modeling method you end up using.

chris · July 15, 2017, 1:32am

https://stanfordmlgroup.github.io/projects/ecg/