NLP meetup setup / Random MIMIC-III Discussions


I posted a feeler in the chitchat earlier this week to see if there is interest in having a meetup dedicated to NLP topics, and several expressed interest in the idea. This group would ideally be more on the advanced side of NLP, and oriented to discussing and trying out new ideas at the intersection of NLP and deep learning.

My Research Interests
My own particular interest at the moment is in extending transfer learning methods beyond typical text classification to other learning tasks (e.g. word tagging, seq2seq, summarization, coreference resolution), and in implementing the tools to do these just as easily as we can do text classification (if we have to).

I’m also very interested in useful data augmentation techniques for NLP, natural language generation and improved language models (e.g. GPT-2), interpretation of deep learning models for NLP, and applications of NLP to legal tech (the field I work in), medical records, and even network security (like using NLP methods for analyzing logs).

While my interests tend to be more on the applied side, I’m interested in theory as well, and am not at all afraid to get mathy (my degrees are in physics and math, after all). With that, I’m interested in theory of deep learning and optimization algorithms as well.

Plan of Action
For those taking the course in person, I’d ideally like to setup a sort of weekly meetup or something like that. Since many of us work during the day this would almost certainly have to be during evenings or weekends. We can also figure out a way to set up Zoom for those who might want to participate remotely.

Regarding discussions, I’d personally like to do a blend of papers we decide are interesting, mixed with practical use case discussions (so we can gauge how practical/useful an idea is), and (of course) showing the results of our own experiments on stuff we find interesting.

Relatively soon I’ll attempt to set up a poll soon to discuss meeting times and locations. Please do be flexible to meeting times because it can be very hard to get everybody to agree on one if we’re not all willing to budge at least a little.

If any of this seems interesting to you, or if you have ideas to add, please feel free to post below.

Here is some more detail on the goals I have for this meetup.


I’m interested in NLP, but I’m not able to find much resources other than some fastai lessons. It would help a lot if people will share their ideas in a group. I would love to participate

I’m also interested in NLP. I’m also interested in transfer learning - as an example in the medical field we often only have access to a small set of clinical notes - which makes it hard to get a good model from scratch. Other areas of interest are sentiment classification (not just positive vs negative). Language models and the ability to generate text is also of interest.

@magiclantern I’m interested in a lot of the same things. I’ve actually been doing some work for about a year with Berkeley Lab on using NLP for clinical notes to predict a patient’s risk of suicide in the near future. We tried to do some of the same stuff on publicly available records (e.g. MIMIC-III) and ran into the same problems due to limited data (especially when trying to work with them as patient time series). I’d certainly be open to talking more about this stuff in these meetups for sure.

I am also very interested in NLP. I work at a startup that provides NLP solutions for finance, and a lot of my work is doing research on incorporating the latest and greatest ML solutions in our offerings. I live in South Bay and attend the class in person. Please keep me posted if you decide to meet up in person, or if it’s just going to be a Zoom meeting.

1 Like

Cool. Interested. Thanks!

1 Like

Hi, been working on NLP/NLU for many years and nowadays the NLP field is like a jungle. My advice for @rkingery is to stay focus on one or two tasks/problems. It is really hard to keep yourself up to date!

About papers why don’t you check the NLP section in the thread below:

Interested as well, following the course in SF - so down to meet up in the afternoon / evenings

1 Like

I am also interested

1 Like

rkingery - Are there other data sources for people interested in medical NLP to be able to get their hands on some data?

I’m interested in an NLP meetup. I’m remote, though next week, March 24 - 28 I will be in San Francisco for a conference.

I’ve been working on other projects but the 2019 course is inspiring me to spend some time researching and getting value out of deep learning for NLP. While my local datawarehouse is getting clinical notes, I don’t yet have access. Was planning to start some investigatory work with MIMIC-III

For those that aren’t familar with MIMIC III, see - registration and course work (believe its all free) required to get access. You may also have to have some sort of research institution affiliation to get access to the full data set. I got access several years ago, so forget some of the specifics.

1 Like

Yeah we started with MIMIC-III for the same reasons. The dataset we wanted to get access to, the MVP dataset, was taking forever to get IRB approval. The problem is that MIMIC-III is ungodly small, particularly for time series prediction, making it hard to scale results drawn from that dataset to larger ones like MVP. There’s also a huge problem with noise in the clinical notes, which caused a lot of our researches to abandon focusing on those in favor of the structured data. It’s a good place to get your feet wet though I suppose.

There’s another dataset out there I believe for in-patient records, but it’s just as small, and I can’t recall the name of it at the moment.

1 Like

I ran into the same problem you describe with MIMIC-III trying to do some causal inference. Once you get to a specific item to analyze, there’s only a few cases to look at.

I haven’t started digging into this one yet, but it’s supposed to be much larger and cover more institutions:

Upon hearing about the eICU dataset, we thought to try the same approach that failed with MIMIC-III, but haven’t gotten to that yet. To my knowledge there are no free text notes - just structured tables.

There is a VA hospital close to me, but I haven’t gone through the hoops to try and get access to their data - no urgent need, so I’ll avoid the red tape for now.

@rkingery I’m also in healthcare and I’ve done some personal projects using the MIMIIC III dataset with sequence models. So I’m definitely down for a SF meetup remotely(long commute).

1 Like

I haven’t worked with MIMIC III extensively, but the clinical notes I saw there look like the real world of healthcare. I think “noise in the clinical notes” is the status quo for medicine. But I also think that the notes are where the information is, for the physician and hopefully for a neural network. I know one more person in this class who has access to MIMIC III; I will let him know of this thread. Also, a friend of his is playing with “SentencePiece” as a tokenizer, so this may help with the chaos of clinical notes.

1 Like

I wanted to post some thoughts/findings (at the risk of turning this thread into a post about MIMIC III – should we split it out?) I have some experience in ML for a few years now, and have applied to a variety of areas. I have also been interested in using it for medical data in various ways. As a first attempt, I had a look at this blog post about using bag-of-words (BOW) and thinking that an LM would do far better. I have no attachment to this particular problem, I am just taking this as a first experiment for myself. I use the usual method of LM training to classifier.

There have been some challenges, but overall I am making slow progress working with this. Here are some thoughts:

  • Yes the data is messy, but that is what real data looks like. I am glad to hear further confirmation of that observation.
  • The SpaCy toeknizer is great for a consisten corpus of words. And if you try to use it on medical data, you will get a lot of xxunk because the words are just not in the vocab. You can expand your vocab to include these words and then learn embeddings for them but I had not had a ton of luck with that.
  • I have used SentencePiece and I find it to work well (I actually also used it to build my own Wiki103 data set, and it worked fine there also. As mentioned, it uses Byte-Pair Encoding (BPE) as the default tokenizer. This will break down a word into sub-parts and there are even tokens for the
  • MIMIC-III has many name/date related tokens [****] I tried to work with these for a long time and get them tokenized, but I am having the most luck removing them entirely. I don’t think particular patients matter for this problem and they seem to confuse the model to get the sequence of works that are coming out.

I am currently experimenting with several “tricks” to see what might work well. It is all in prototype phase, but I am sharing to get ideas/feedback/criticsm from others. So don’t hold back!

  1. I replaced the usual last state / max state / avg state head with an attention head (like in a Transformer model) so that we can focus in on any words in the sequence and use them.
  2. The standard text class learner has a default max_len=20*70 with a bptt=70 This is fine for many documents, bu there seems to be a long tail of tokens for the diagnosis text. That length would cut off 78% of documents. I extended it to 57*70 and now excluded just 6% of the text in the notes.
  3. Unbalanced cases can be handled in a few ways. I tried weighting the classes, down-sample the over-sampled case and up-sample the under-sampled case. I am using the later version right now. to include all the negative cases. If you do not balance the outputs, you get results that seem pretty sporadic to me.
  4. Added ROC AUC as a metric so I can see how that improves over time.

I am happy to share code if others are interested. Right now it is rough, but I can clean it up. I should also mention that I have not put the MIMIC data into a database. I just use the flat files and join as-needed using pandas.

With all this effort and work, I am still at an AUC of 0.71 or so. Not much better than the BOW approach. I have not lost hope, but things are challenging in this space for sure.


Hi Bobak,

I am suspecting that your model is better than it appears to be from the ROC AUC of 0.71. I know there is a lot of interest in predicting 30-day re-admission because Medicare uses a low rate as a measure of high quality clinical care, and pays institutions for keeping their rate low. But I’m not convinced it reflects quality of care or anything in particular that could be predicted from the notes. You are already 1% higher than the ROC in the blog post, and the google paper used a completely different data set. Perhaps inpatient mortality would be a better test; I’m not sure.

I like your use of SentencePiece. When I heard about it, I read part of the paper and it made a lot of sense to give a tokenizer that would do the right thing independent of the quirks of the data set.

Right now I need to catch up to you before I will have anything worth sharing. Look forward to more model sharing later!

Dana Ludwig

One of the things we tried to do was predict 90-day readmits. At least in that case I could get ROC AUC scores in the low 70s just using classical NLP methods. Speaking in the data being noisy, this was literally the only dataset I’ve ever worked with where undersampling actually helped with training.

Though the predictive performance wasn’t that great in my opinion, the feature importance plots kind of made sense (words like hospice, cancer, and DNR were common indicators of future admittance). This likely meant that the model was just doing a good job of learning which people were likely to die I think though, so we abandoned the approach. Here’s the notebook if you’re curious:

Going back in and removing the patients who died hurt the model substantially, reducing AUC ROC scores to the low 60s. Then the only significant feature importances were how many times prior to that visit the patient had been to the hospital (I created a <newnote> token to track this). See the bottom of this notebook if you’re interested:


if virtual, pls keep me looped, thx :wink:

1 Like

From my experience using the tabular data to predict readmission and LOS risk. The 30-day window models performed much better. Is there any reason why you are looking at 90-days vs. 60 or 30? Granted the data I used was from a larger hospital system.

I think the very best place to learn about modern (neural network-based) NLP is this great course by Christopher Manning and Abigail See:
It is packed with hot topics as well as detailed explanations of basics of various architectures.


Nice idea examining how things look after removing patients that died