Trying to come up with a clinical NLP solution (English language) where we need to fill up Form data based on contents from a document. Form fields include person names (some 3 different people – as in sender, receiver and subject in discussion), date and time, and few fields describing the subject and his condition.
We’ve already tried this with an open source tool called cTAKES (based on apache UIMA - https://uima.apache.org/) that comes with models pre-trained on a specific corpus, and makes extensive use of medical dictionaries such as SNOMED, ICD etc.
Currently exploring other options which include building such a pipeline on our own from scratch (need to decide based on feasibility and timelines). Here for suggestions on a solution approach to the same, some gyan based on your experience, comments on possible overall accuracy and a ballpark timeline (assume 2 good developers with limited experience in building models, and 1 guy who can double up as an ok-ish developer – me) in case any of you had worked on similar projects before, may be a different domain.
Also, should it necessarily be a pipeline? is a monolithic architecture possible? what should be the considerations in this regard?