Anyone know of a resource that lists the possible/best metrics for various NLP tasks?

Looking for a paper or any resource that describes for each kind of NLP task (e.g., classification, NER, summarization, language modeling, etc…) the following:

  1. Possible metrics
  2. Example and intuition behind each of these metrics
  3. Best metric(s) to use and why