NLP Information Extraction from Text

magiclantern · September 7, 2019, 4:03pm

@abhimanyu100 thanks for the suggestion. Traditional named entity recognition using libraries such as NLTK or spaCy are techniques I’m familiar with already. One problem I’ve seen in projects in the past is that most NER systems are trained against general English and thus have poor precision and recall in the medical domain. There are of course medical specific NER systems, but most of them are clunky and don’t have a good Python API (many seem to be written in Java).

I’m not as familiar with how do do NER with deep learning, hence my question here on this forum thread. From the research I’ve read, deep learning techniques seem to do as well or better with NER than more parse tree based methods.

So far, thanks to recommendations from others on this thread, I’ve been investigating the sequence to sequence and related techniques. They seem promising, but I’ve yet to get a good medical dataset with annotations to verify it will work for my use case. I 'm thinking the medical NLP datasets from the i2b2/n2c2 challenges will be a good option for that. If others are interested, you can request access to them at https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ - I’ve put in a request, but haven’t yet heard back, so not sure how difficult it is to get access.