NLP: Any libraries/dictionaries out there for fixing common spelling errors?

I was able to independently confirm these results myself against the glove.42B.300d.txt dataset. @er214’s findings here is indeed amazing!

And then I thought, could we improve the transformation vector by averaging the transformation vectors for multiple misspelled words (not just “relieable”)? The answer is yes.

I was able to match roughly 600 of the 700 misspelled words I discovered in the corpus I’m using with the those in the glove dataset. Following the same process @er214 outlined above, I created 600 transformation vectors (one based on each misspelled word), and then took there average for a final transformation vector.

Using this final transformation vector, I went back through my dictionary of misspelled/correctly spelled words and achieved significantly better results using my ensembled vector as compared to that based on a single misspelling (98% found vs. 89% respectively).

@er214, if you end up writing that paper or blog post and are interested in using my experiments, shoot me a message and I’d be glad to help.

4 Likes