Text Classification on Unbalanced dataset of web articles


I have a binary Text classification task for web articles in English which has data that is highly imbalanced. What should be my approach in this case?

  1. Should I go for Transformers or go with the ULMfit approach as the articles can vary in size, I can have articles which have 10 -30 words and articles which are more than 1000+ words?

  2. What is the best approach in terms of validating the performance for such tasks - should my Training dataset also be representative of the real scenario?
    Eg: I have 2000 articles of Class A and 100 articles of Class B

    • how should I design the Training dataset ( what should be the percentage representation in terms of the two classes)
  3. Are there any specific loss function and metrics that I should consider, as clearly accuracy wont work in this case. ( I can go for precision and recall, but any specific cases that are used in NLP and in particular for such cases of unbalanced nature.)