Quora Dataset - Similar Questions

Quora recently released a dataset containing question pairs - https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs

Anyone got a chance to play with it? What model did you use and how did it performed?

1 Like

Haven’t tried it yet, but I would bet the Keras shared weights example from the functional API page would work well. (In the example it is trying to figure out if two tweets are from the same user but easily extensible to this.)

https://keras.io/getting-started/functional-api-guide/

I have been playing with this…

here is a recent blog post that uses a method described in a relatively older paper.

They use manhattan distance between encoded values to determine semantic similarity. Maybe a different metric could help?

And this is a single LSTM cell. Maybe Bi-LSTM with Attention can improve the results?

Anyone else interested in this?