Setting up an experiment to find analogies

Hi, I used a pre-trained word embedding (GloVe) to find analogies between words, something like “man is to: woman, as son is to: ____(daughter)”.

On simple terms like this, it works perfectly. However, I have tried it on some other, more obscure cases and see that it fails. For example:

“eagle” is to “bird”, as “beagle” is to “____”. (dog). The model returns “influenza”.

I’m trying to get a better grasp on how one might evaluate these analogies more generally. It seems like any answer will be biased by distance in the distributional space: Closer targets are more likely to be hit. If it happens to be further way, then even if the analogy is completely valid, it is very unlikely to be predicted correctly.

If anyone could give me a tip on how to set up a small experiment to explore this further I’d be very very grateful!