To pull in my letter:
As the director of the Insider Threat program at AIG, I am creating a global program that prevents our data from being stolen. Most solutions involve monitoring different data and behaviors of employees to determine their risk to the company. There are many solutions sold by vendors that claim to have AI, machine learning, and other buzz words. However, I decided to get some hands-on experience to ensure that I could ask vendors intelligent questions to make sure these solutions are effective, don’t create a bias against minorities, and don’t take in private employee information that isn’t needed (most pull in too much).
I found the Fast.AI MOOC to be the most useful teaching module for my development. For nine weeks, I followed the MOOC schedule of a lesson a week. My python was a little rough, so I probably spent 20 hours a week instead of the recommended 10, but I got through it. The last two weeks culminated with an attempt at a Kaggle competition (Web Traffic). I have a post about it here.
While it has more than met my original objectives, this course has also changed my perspective on what I want to do. The insider threat is neat, but lesson 14 discussed data scientists working with pediatricians in the PICU to create better outcomes. My wife is a pediatrician, my father-in-law a pathologist, and I am constantly wondering how I can take what I have learned to apply it to their highly specialized work. In December I plan on starting the process with the University of Houston Texas Medical Center to get patient data to help diagnose kidney biopsies.
Very excited to get started soon!
Edit: I wanted to put some additional clarification since the tweet about this post went out, and a co-worker was asking if I was leaving the company. So some more information that I would prefer to stay in the private forums.
While I work at an insurance company, I am not involved with our insurance products (for now). I prevent employees from stealing company secrets and intellectual property. We conduct forensics on employees are doing strange things with company computers and assets. While completely legal and all employees agree to the monitoring, employees continue to steal from companies, and it is a problem many companies don’t want to talk about.
I have been trying to make this program better by evaluating 3rd party vendors. The first round of the MOOC helped me ask these vendors better questions about how they use machine learning and try to ensure we are not opening ourselves up to liability by selecting a solution that uses data unethically. For example, imagine using race as a possible feature for a model detecting criminal behavior. Terrifying. But, at least one vendor stated they use unsupervised learning on every field from HR data.
The kidney work is still pending. My father in law is a pathologist and has about five years of slides from his cases. This research would not replace my job at AIG as I need it to support my family. A fascinating classification problem, I am hoping not to run into too many hurdles with my day job and healthcare regulations.