NLP: Any libraries/dictionaries out there for fixing common spelling errors?

er214 · May 16, 2018, 9:14am

Also not a direct answer, but relates to the suggestion of a language model. Assuming you are using a deep learning model of some sort with an embedding layer, you could try to create new vectors for the missing / miss-spelt words rather than correct them.

The trouble I have found with spelling corrections is that you get quite a lot of false positives thrown in appropriate corrections. The risk is that you might change the meaning of a sentence by changing the spelling of an unknown - rather than miss-spelt - word. (For example, proper nouns, i.e. names, often get corrected in this way).

What I’m currently trying is a relatively simple approach of calculating the average of the word vectors of known words within a specified window either side of the unknown words. I’m using the pre-trained Glove vectors.

A more complex option that I’m also trying (with limited success so far …) is to train a language model with a mask on the embedding layer that only allows updates to the unknown word vectors. So, initialise the unknown vectors using the averaging method mentioned above, and then train the language model with masked updating of the embedding layer in order to learn more appropriate embeddings for the missing / miss-spelt words.

My guess is that the problem with either approach is that you will often only have a few examples of the missing words, and therefore little to learn from.