Ah, I see. So accuracy and loss are still a good approximation of the language model performance during training, but for more concrete results perplexity is what matters.
Also, thanks for the script! An Adam-W implementation was much needed.
Ah, I see. So accuracy and loss are still a good approximation of the language model performance during training, but for more concrete results perplexity is what matters.
Also, thanks for the script! An Adam-W implementation was much needed.