I have thousands of social media texts that describe drug and medication reactions. Many of these are for the same medications, but I am not wanting to model text talking about these specific meds, just to try and determine language that talks about reactions to medications like the ones I have in my data.
That is, for me the “gold” is not a mention of a specific medication, but rather a mention of a reaction to it (or any medication).
My idea is that I should be able to model language that talks about drug reactions, and apply that to any drug/medication. I should be able to download social media feeds that mention my (current) medication of interest - its very easy to use pattern matching for that task. The NLP part is to examine my social media data for language that dicsusses actual personal crises in relation to the medication. This way I hope that I can eliminate news articles, rants, gossip etc.
How do I deal with this data to eliminate an untoward focus on an actual drug name? Do I just replace it with a generic name in the text I feed to the model? But then wouldn’t it “go to town” assigning importance to that generic replacement word? I need something like a tf-idf approach, but I didn’t see how to do that with the techniques @jeremy covered with the ArXiv and ImDB data sets.
If anyone has an intertest in collaborating with me and my wife @Sedigh with this kind of project, we would love to hear from you