Seq2seq: Better to improve spacy or build a custom model for named entity recognition?

wgpubs · June 28, 2018, 9:14pm

Spacy does a pretty amazing job at everything it does but I’ve noticed issues w/r/t entity identification with my particular corpus that needs to be improved. For example, the out-of-the-box spacy implementation tags “Carls Jr.” as PERSON rather than an ORG. Additionally, it doesn’t tag titles (e.g., Dr., Prof., Mrs., Mr.) out of the box but does allow you to teach it new entities.

Would be interested to hear from folks with experience using spacy and/or custom architectures for named entity recognition. Would you recommend improving spacy or going with a custom solution?

Manishankar · July 26, 2018, 8:42am

Hi, good to see this post… we are considering using spacy for development activities, and hopefully take it to production. looking for similar feedback on spacy.

fmobrj75 · July 26, 2018, 4:31pm

The developers of Spacy (Explosion AI) developed Prodigy, a solution for training models for custom NER.

harikrishnanrajeev · February 17, 2020, 5:53pm

@wgpubs, Greetings !!!

What was the approach that you took … using spacy or went for a custom NER solution ?.

wgpubs · February 18, 2020, 10:55pm

I actually created a wrapper around huggingface in order to use their token classification model with fastai. Right now I’m in the midst of updating everything to v2 with plans to create a token classification model that can sit on top of an ulmfit LM.

harikrishnanrajeev · February 19, 2020, 4:39pm

thanks, were you getting good results with a token classification model ?. Will it give better results than a LSTM + CRF approach ?.

wgpubs · February 19, 2020, 6:17pm

I’m getting a bit better than there results reported here: https://huggingface.co/transformers/examples.html#named-entity-recognition

The only thing I’m confused by is what they report for their BERT, RoBERTa, and DistilBERT at the bottom. The F scores are really high but I suspect they are not training on the German dataset mentioned above … I’ve asked what dataset those models were trained/tested against but never got an answer

One cool discovery is this: After training on the German Eval 2014 dataset using a multilingual model … I did inference on an English dataset and the results were as good as on the German even though not a single English example was seen during training. Also, I should mentioned that I used they typical fastai mechanisms to fine-tune the model (not the HF scripts).

We’ll see