I also have a project with clinical notes at my institution, and I want to move to MIMIC III so we can all use the same baseline of data to compare our progress.
In my original work, I was trying to identify cases where the patient had a current illness of “shingles”. My thought was to use the basic ULMFiT approach, and train our larger corpus of notes on a language model, and then use transfer learning and 100 annotations to detect “shingles”. This problem is of interest to some colleagues in the Rheumatology department who work with immune compromised patients, and those patients have a high incidence of shingles. So if I could make it work, one clinical research team would be happy.
To make the problem more realistic, I limited my fine-tuned classifier to labels for only cases for which shingles-like substrings were found in the text (with our SQL database). It’s easy to identify the candidate cases to this subset, because our SQL database is plenty competent at finding substrings. In this case I included any note with one of the following substrings: “shingles”, “zoster” or “post-herpetic neuralgia”.
When I manually created the labels, I found that in the above set of cases, the most common classes were:
- shingles in the current illness. Most often I knew it was the current illness because it was discussed in the “history of present illness” or the date of “shingles” was within a few days of the date of the clinical note
- past medical history of shingles. These were in the “past medical history” section and/or had dates far in the past (at least a month or more)
- Post-Herpetic neuralgia - This is a long-lasting complication of shingles, and is a nightmare in its own right. If your shingles happens when you are weak on your immunity (especially older people or immune compromised people) the rash goes away, but not the pain. The pain can be fairly extreme, and it can last for months or years, and there is no medication that really makes it go away. But for my researchers, this isn’t what they were looking for, so I counted this as a separate (negative) category since it happens weeks or months later, by definition.
- Shingles vaccination - many docs chart that they recommend a shingles (or zoster) vaccination for their patient. I also used this class for people who actually got a shingles vaccination, and this was usually included in a section called “immunizations” or something similar, and included a list of other vaccinations that they have had.
- shingles lab test - some very immunocompromised people had a blood test for evidence of the zoster virus in their blood (antibodies or virus particles).
With these labels on 195 patients, I trained the model on 80% and validated on 20%.
When I did the basic fine-tuning and classification phase, the model wasn’t too good at finding shingles as the “current illness”. When I looked at the “attention” head, most of the time, it wasn’t even focusing on the mention of shingles. The problem is that I just said “learn to predict these labels” without telling the model that I was looking for “shingles”. Each history-and-physical note was often mentioning 100-200 additional medical problems, and I gave it nothing to know to focus on the shingles/zoster phrases.
On my second try, I’m going to need to tell the model:
- Look for mentions of the concept “shingles”
- If you find the concept, make note of what section of the document you found the concept
- If possible, find a date for the concept, and compare it with the date of the note, so you will know whether it is a “current” case or a case from the past.
This made me realize what has already been talked about in the literature. To build a predictive model that understands a clinical note, it should determine the following things in the note:
- It should identify mentions of clinical concepts (disease, signs, symptoms, lab tests and their values, procedures, etc)
- It should know what section of the note those concepts were mentioned. Some typical sections are chief compliant, current illness, past medical history, family history, Immunizations (if present), physical exam, assessment/impression, plan (for care management, testing, treatment). These sections are unreliably labeled, so you need a chunk of the network to match the pattern common with each of these sections
- is there uncertainty about the observed concept? For instance, if a patient has a story that is consistent with a myocardial infarction, but they don’t have definitive proof, the providers will admit the patient to the ICU and say “Rule Out Myocardial Infarction” or just “R/O MI”. This means they don’t know for sure, but the risk of death if they discharge the patient home with a MI is so great that they admit the patient with the concept that “they might have an MI” and it needs to be “ruled-out” with further tests and time before it is safe for them to go home. There is a lot of uncertainty in medicine and it is important for the reader to understand the terms that reflect uncertainty.
- If possible, it should identify the date associated with that concept, including approximate dates like “last summer”, “2015”, etc.
- It should flag a “negative” mention of the concept, e.g., “no chest pain”. When the provider affirms that something was not found, it means the question was asked, and the answer was “no”. These are usually asked for clinical concepts that are very important in diagnosing the patients problem or level of severity, so they shouldn’t be viewed as “NULL” or no mention.
- in the family history, there may be a mention of the father dying of cancer, etc. It is important to note that this concept (cancer) does not apply to the patient but to a family member.
- If the concept is a lab test, it should identify the value of that lab test (as distinct from other lab tests)
The model should be prepared to deal with 100-200 such concepts in the same clinical note.
These properties are easy to find by the provider reading the note, but hard for a computer to identify, and hard to associated which concept the property should belong to. So this is a perfect problem for Deep Learning, and I’m very excited that with the tools we have, we or somebody can build a model that can do this.
The current “state of the art” is that most algorithms use keywords and rules, and they do pretty well at identifying the concepts mentioned in the note (60% to 80% accuracy), but the accuracy on identifying these other properties, including dates, is very low. If a net could understand these things at a human-level, this would unlock the majority of clinical information in the electronic medical record, and make many downstream predictions possible.
For anyone who finds this fun and rewarding, I would like to hand-label a bunch of clinical notes in MIMIC III for each of these issues above, and welcome everyone who can register for access to MIMIC III to work on building a network that can extract these concepts and properties. I am a physician, just barely and not practicing, but I believe I can get support from expert physicians at my institution, to make this a meaningful exercise. My goal is that we would publish anything that the experts find useful, and include the contributors as co-authors. In addition, any of you would be free to publish or blog on your experiences, and this would potentially be useful in advancing your reputation in DL and or clinical DL.
The MIMIC web site has a collection of challenges already listed. I believe that they encourage people to create their own challenges that can be hosted and publicized on the web site. I think it would be cool if we could do that, and participate as either several teams or several individuals, as meets your personal preferences. If we can do this, we can create a performance baseline on MIMIC III, on which anyone can build, both now and in the future, and this can advance the state of the art in this field.