I was able to independently confirm these results myself against the glove.42B.300d.txt
dataset. @er214’s findings here is indeed amazing!
And then I thought, could we improve the transformation vector by averaging the transformation vectors for multiple misspelled words (not just “relieable”)? The answer is yes.
I was able to match roughly 600 of the 700 misspelled words I discovered in the corpus I’m using with the those in the glove dataset. Following the same process @er214 outlined above, I created 600 transformation vectors (one based on each misspelled word), and then took there average for a final transformation vector.
Using this final transformation vector, I went back through my dictionary of misspelled/correctly spelled words and achieved significantly better results using my ensembled vector as compared to that based on a single misspelling (98% found vs. 89% respectively).
@er214, if you end up writing that paper or blog post and are interested in using my experiments, shoot me a message and I’d be glad to help.