I’m just wondering how effective it would be use a fine-tuned language model as an auto-encoder instead of turning it into a classifier?
Specifically, I have some search terms (1-10 words) and I want to match them to some longer articles (<500 words). So I was thinking about using the final hidden layer of a language model to return a vector per search term or article. Then measure the distances between all the vectors to find relevance.
It’s probably a sledgehammer to crack a nut, but alternative methods such as document vectors strike me as a little dumb. They don’t average well, and usually aren’t fine tuned to my overall corpus. So I was hoping this would be a more effective way.