ULMFiT - Medical/Clinical Text

SNOMED-CT - excitement and misery
I’m trying to familiarize myself with SNOMED-CT, one of the most widely used ontologies/dictionaries for structuring clinical observations and diseases. I am focusing on this because the other popular ontology, ICD10, really didn’t deal with low level signs and symptoms that are found in clinical “History and Physical” documents. Understandable since ICD10 is focused on billing and statistical reporting to regulatory agencies.

I also concluded that it is much better to get SNOMED-CT by downloading and installing UMLS from NLM, than by downloading SNOMED-CT from the source organization, “http://www.snomed.org/” or “https://confluence.ihtsdotools.org/”. The reasons are:

  • UMLS has one file structure to hold a bunch of key ontologies I will need like RxNorm for medications, LOINC for lab tests, CPT for procedures, etc. This means I can learn one set of file formats for all

  • snomed.org did some annoying things like separated the concept code into one file and the primary text description of the code into a different file. They also included the entire history of the ontology, including obsolete terms, in the same file, whereas UMLS has a tool “Metamorphosys” that lets you keep only the active terms.

Now I’ve downloaded the SNOMED-CT files, and I’m using built-in software “Metamorphosys” to browse the terms in tree format, using their own indexes, and before loading it into any database. This software works on Windows, Mac or Linux.

The “Misery” came first. I was looking for something simple like hearing “rales” or “rhonchi” when listening to the lungs with a stethoscope. As I remember, “rales” was a term for the sounds of crackles, that represented fluid in the lowest level lung alveoli, associated with pneumoni. “Rhonchi” was a coarser sound associated with obstruction or mucous in the higher airways, associated with bronchitis. ICD10 does not have a code for either one of these.

So I walked down the tree:

  • Observable Entity
  • -> Clinical History Examination Observable
  • -> Respiratory Observable
  • -> Respiratory Characteristics of the chest
  • -> Chest auscultation feature [meaning listening with a stethoscope to the chest]

Here is what I found:

image

Nothing more; just synonyms for those phrases. No rhonchi or rales.

Then came the exitement! I walked down the tree on a different path:

  • Clinical findings
  • -> Clinical History and Observation Findings
  • -> Respiratory auscultation findings
  • -> Added respiratory sounds

And here is what I found below that:

image

These are the little details they taught us in school, using the language they used and that we can find in a physical exam document.

If I go further down the sub-tree “Respiratory Auscultation finding”, I get even more specialized language that I have never heard of, such as “Coin Sign”. I may never see that in a clinical document, but it’s great to know that if I do see it, this ontology will have a slot for it.

image

Now the trick is to find a way to detect these finely detailed clinical observations in the clinical note. My hope/belief is that this is where the gold is. If we are ever going to discover new medical knowledge, it will require looking at our medical record data at this level of detail to find new clusters and associations of observations and treatments with outcomes.

At some point, it may be better to let the model decide whether these words, in context, are important features, but SNOMED represents a way to initially focus on what many subject matter experts have designated as “atomic” features that are likely to be significant to them.

Hi @danaludwig,

Can you provide some more details about how you used SNOMED CT for your analysis and what tools/softwares you used to access it programatically?

Thanks!

Hi @alohia,

SNOMED CT is a structured “ontology” or “vocabulary” or “dictionary” of medical terms that has been around for many years. It has a broad scope of medical knowledge, but is strongest in the area of diseases, signs and symptoms. I found that the easiest and most powerful way to work with SNOMED CT was to use the larger “UMLS” metathesaurus. UMLS is also a ontology of medical terms, but it is a hybrid of over 100 other terminologies such as “SNOMED CT” (for one). UMLS is really a wonderful resource for us, that captures three types of information:

  • Concept codes - represented by a CUI or “Concept Unique Identifier” and a phrase to describe that concept. The beauty of the CUI is that it tells you what physicians and scientists think are important in medicine. In the UMLS, the curators of the vocabulary, often physicians, will try to find one common CUI to link the same concept from all 100+ ontologies into a single identifier. This is very important because the combination of knowledge from each ontology gives you a “profile” of that concept, including definitions, synonyms, etc.
  • Atomic code or AUI - any phrase from any ontology gets assigned it’s own unique AUI code. Then UMLS links each of those AUI codes to a parent CUI.
  • Relationship graphs - the UMLS provides a very large collection of relationships to show how the CUI’s are related to each other. The most important is " is a subset of ". There are other relationship types, like " is a symptom of ", but I think the “set/subset” relations are the most complete.

Anyone can sign up for access to UMLS. Go to this url to register for access:

https://uts.nlm.nih.gov/uts/signup-login

Dana

Dana