Hey fast.ai community members!
We finished a small but useful extension of the original ULMFiT paper exploring how varying the amount of available unlabeled domain data impacted the accuracy of the language task. The headline takeaways from the work are:
- 75% of the accuracy boost found in the original paper can be achieved with about a third of the unlabeled data
- It confirms the intuition that using any domain data to extend a ULM to be domain specific is better than a the ULM on its own and ULM + Domain is always better than a Domain only model.