ULMFit: For each of the datasets, what is the human accuracy?

I’m using ULMFiT for a multi-label classification problem with terrific results thus far.

With less than 15k labeled examples (predicting 8 labels), I get the following accuracies when optimizing for various F-Beta values against an ensembled backwards and forward trained models:

F-Beta Validation Accuracy
0.5 0.9375
1 0.9375
2 0.9276

My question is, how does this compare with human being accuracy?

Have there been in studies wrt to the various datasets mentioned in the ULMFit paper (e.g., IMDB, TREC, etc…) that measured how accurate human beings were compared to the various machine learned predictions? I’ve wondered this myself and am sure to be asked this by folks in my organization when it comes to justifying the use of ML in tasks such as these.

1 Like