Fast.ai NLP meetup setup / Random MIMIC-III Discussions

313V · March 24, 2019, 1:05am

Nice idea examining how things look after removing patients that died

rkingery · March 24, 2019, 2:16am

Sparsity was the only real reason we focused on 90-day readmits. We were only interested in subsets of patients with ICD-9 codes classifying under “diseases of despair” (e.g. depression, suicide, alcoholism), and there wasn’t enough data in that group to make tracking 30-day readmits feasible. And 90-days seemed to be the shortest window we could do to have reasonable patient time series in the disease of despair group to work with.

sparalic · March 24, 2019, 6:02pm

I figured. It’s often the same problem we face when building cohorts. Deciding how wide to cast the net to get as many patients. What is the plan for this study group? Are we going to use this space to bounce ideas off on our individual projects?

keratin · March 25, 2019, 3:46am

Bringing the discussion back to general NLP, I’d love a more focused study group where we could delve further into advanced topics. I’ve some background with NLP but we could all learn and benefit from one another. How about a slack/discord group? We could have separate channels for MIMIC and other project specific topics, others interested could join/collaborate etc. Thoughts?

sparalic · March 25, 2019, 4:06am

This sounds great for me! As my goal is to delve deep into NLP…I’m okay with either platform.

rkingery · March 25, 2019, 4:55am

Here’s some more detail on the vision I sort of have for this thing:

First, I think it would be beneficial to have an in-person “core” component of the group that can meet regularly in the Bay Area. People outside the area would be welcome to participate via Zoom or something like that, but I do think it’s beneficial for a lot of reasons for the people who can get together in person to get together, and we should enable that interaction. Thus, I’m thinking of this as a local group that others can (and should) participate in remotely, rather than a group designed to be done completely remotely. If this got big enough (which would be crazy), maybe there can end up being multiple in-person groups, but I’m certainly not thinking that far ahead haha.

I’m thinking of this more as a “research and development” group than a “study group”. By that I mean the goal in and of itself is not to “learn” the fundamentals of NLP, but to extend the practical range of NLP by both pushing the state of the art and implementing the tools to make those methods available for easy use. In that sense, there would be an assumption that participants would already have some experience with NLP and deep learning along with some coding maturity, or be capable of getting up to speed on their own relatively quickly (which in my opinion wouldn’t be that difficult, but mileage may vary).

My guiding approach is that practical >> cool. I’d like the group to be strongly practically focused on techniques that are viable in the near term rather than the long term (Think company R&D type stuff over academic research type stuff). Which for example means favoring topics like transfer learning and language models and domain-specific stuff, but disfavoring topics like AGI or reinforcement learning or other things that will have little/no utility in the next few years. Not as sexy, sure, but more useful to more people now, with the highest “ROI”.

In that spirit, think of this less as a traditional “seminar” where people get together and somebody gives a talk on a paper and people ask questions about it, and more in identifying, studying, and (most importantly) discussing the practical viability of modern NLP ideas and implementing them if they seem fruitful. These don’t have to be implementing papers, but can come from implementing your own ideas as well.

With this in mind, there would ideally be a development component to this. If we identify stuff that might work well, but isn’t easily available in libraries now (which will usually be the case), we can implement the tools to make those techniques easier to use. How this would be done, of course, would still be a matter up for debate. It would all be open-source, of course.

To make this work, I think it would be beneficial early on to identify research thrusts. For example, some might be specifically interested in NLP applications to medicine, and may want to work on stuff specialized to that (e.g. medical-specific language models). Others (including myself) might be interested in extending the utility of transfer learning to as many useful NLP tasks as possible, and can experiment with and hopefully implement the tools to do that. Others might be interested in improving language models, or chatbots, etc. Of course, people can be involved in multiple thrusts if it suits them, but the goal would be active over passive participation.

Here’s sort of how I’d think of a typical meeting. At the beginning, each thrust area would share with everyone what they’re working on and what cool stuff they’ve found, which can result in some discussion. After that, each thrust would get together and work on their own defined goals. They’d have put together a reading list (like over Slack) and already have mostly read any “papers of the day”. Each person in the thrust would’ve decided on something they were going to try (e.g. reproducing a paper discussed before, running new experiments, or implementing software tools), and the bulk of the rest of the meeting would be a kind of group coding thing where people can collaborate and stuff like that.

I still want to flush the details out more, and welcome feedback, but these are at least at a high level what I intended the “founding goals” of this group to be. It’s obviously not for the feint of heart and may be a lot more than some of you were expecting or hoping for,. It would require more work on everyone’s part. But I think the outcome would be a lot more rewarding for everyone in the long-term by going this route, as opposed to some general “paper discussion” group or beginner’s “study group” or something like that.

That being said, if this doesn’t scare you off, I’ll be glad to work with you to set something up and hopefully get something going in the next couple weeks. Now that I think about it, I shouldn’t have limited this post to the private “Part 2 (2019)” group, but oh well. I’ll figure out how to fix that later.

yenlow · March 25, 2019, 5:19pm

My postdoc training was in NLP of EHR notes. That was 3 years ago before deep NLP was all the rage but much of the classical NLP techniques I believe are still very relevant today, especially when medicine requires specialized domain knowledge in data set availability, ontological tools, etc. I have recently returned to working in Health AI after a detour in entertainment tech and would like to dive more into this, having devoted much of my career so far to health informatics.

I second a more focused approach so we can build towards some contribution to the community, perhaps a library for the open-source minded. Deep learning in healthcare is a little behind the curve so there’s still a lot of experimentation with methods established in NLP or CV (aka low hanging fruits). For example, unlike word2vec or Glove in NLP, the jury is still out on medical concept or patient representation despite attempts to bring the former into the latter.

Think an initial brainstorming to define the key goals of the local group will be a great start. Looking forward!

sergeman · March 25, 2019, 5:35pm

Where would you suggest the meetings would take place? I am located in South Bay. Driving up to SF might be problematic, but meeting on the Peninsula, East Bay might work. Or, perhaps we could have several study groups around the Bay.

sparalic · March 25, 2019, 7:22pm

I’m also in the South Bay, something in the middle would work well for me as well. I look forward to being a part of this study group!

tanyaroosta · March 26, 2019, 1:53am

I am in South Bay as well.

sparalic · March 26, 2019, 5:29am

Try getting folks to want to try these not so new approaches in healthcare can be difficult! I look forward to really making some strides towards advancing the current state.

Kaspar · March 26, 2019, 10:15am

i guess you are using awd-lstm. is your head using the cache method or is it a real attension layer as in transformerXL

rkingery · March 26, 2019, 11:04am

I live in the North Bay (Sausalito), so going much south of the city would be difficult for me. Ideally we can still try to keep this thing in SF (e.g. South of Market where the classes are), but I’ll wait and see how many inside the city are interested. People inside SF are far less likely to have cars, meaning commutes for them would be harder.

Anyone on here that lives in SF, the North Bay, or the East Bay that’s interested in attending these meetings?

ricknta · March 26, 2019, 6:20pm

I’m interested. I do think that in-person meetings are really valuable. I’m in Danville and not taking the class in person this time but would still be willing to go into SF or another location that’s fairly accessible. Would love to have a group in Walnut Creek! I don’t know if I’ll be able to participate at the level @rkingery is envisioning but presumably there’d be useful roles for folks like me.

bfarzin · March 27, 2019, 4:55pm

What is the “cache method”? I am using a head with Wq,Wk,Wv like in Transformer. I can only have 5 heads because otherwise I can’t get it to load on the card (too much peak memory.) I can share the code if that helps (but it is messy!)

rsrivastava · March 27, 2019, 8:41pm

I am interested and I am also in south bay.

alvisanovari · March 29, 2019, 5:13am

Interested! I am currently trying to figure out Summarization and honestly it is so frustrating as I can’t really find any easy to follow code on this task. Would be cool if we had an example notebook for fastai where we fine tuned the pre trained Wiki103 model on our corpus (or even just a single document of our choice) and then are able to use it to summarize it.

JennyCai · March 29, 2019, 11:45pm

How about meeting at 12pm to 2pm Sundays at 404 Bryant Steet (Sandbox Suites coworking space)? It is a 12-minute walk from SF caltrain station and a 13-minute walk from Montgomery Bart Station. There is also free street parking on Sundays. I’m a member of the coworking space and will be happy to help. People in south bay or peninsula can take Caltrain Baby Bullet to arrive at SF: http://www.caltrain.com/schedules/weekend-timetable.html

kouohhashi · October 6, 2019, 6:23am

Is this meetup happening already?
If so could you post the link of meetup?
I want to join in the meetup remotely.

seyeeet · November 1, 2021, 4:45pm

what people think of doing it online or virtual or at least have a discord channel for it?