Looking for a paper or any resource that describes for each kind of NLP task (e.g., classification, NER, summarization, language modeling, etc…) the following:
- Possible metrics
- Example and intuition behind each of these metrics
- Best metric(s) to use and why