I think the combination of perplexity and accuracy are cool in my opinion. “Perplexity is the probability of the correct label (word), passed through a log function, negated, and exponentiated ( e ^ x ). Perplexity is high when two probability distributions don’t match, and it’s low (approaching 1) when they do match.” From Trask A(2019). Grokking Deep Learning. Manning Publications. New York, USA.