How to get the other datasets in ULMFiT paper?


#1

Could anyone please help me how I can get the other same data set (Yelp-bi Yelp-full DBpedia AG and TREC-6) as the paper( Universal Language Model Fine-tuning for Text Classification) , so that I want to reproduce the result. These data sets seem to have been carefully selected by the previous researchers. Thanks.


(David Bressler) #2

Good luck… I’d be interested in seeing if you’re able to reproduce the results. Please post once you’ve tried.


#3

I found https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M. Most of dataset can be found there and they are reformatted to csv for using easily.
For now, I have tried to reproduce the result using the jupyter notebook code in lesson 10 (I can’t find the experiment code in that paper). I can’t reproduce all results, probably because of the hyper-premeters and decrement of batch size (my GPU memory limit).


(Jeremy Howard) #4

Yup that’s the right source. The imdb_scripts folder in the dl2 course repo has the scripts we used, FYI.