Benchmarking model for comparing with ULMFiT

mayeesha · June 21, 2019, 3:45pm

How should I benchmark a text classification task(non English language) if I’m comparing it with an ULMFiT trained in some non English language?The dataset comes from a paper. Is it ok to compare with the published results only and consider the published results as benchmark?

bfarzin · June 21, 2019, 3:51pm

It can depend on the problem. Naive Bayes SVM is a pretty good benchmark (regardless of language) and you can start with that. Can you share the paper details, maybe the question will be more clear then.

calmdownkarm · June 22, 2019, 3:31am

Depends upon what you’re trying to benchmark - the classifier or the language model. if it’s the classifier - then yes naive bayes, or you could benchmark against a standard n gram cnn or any of the classification models on nlpprogress.com

if its the language model, then it’s typically easy to find word embeddings even for low resource languages, though training BERT or transformer models might be hard if there isn’t a lot of data available.