How to get the other datasets in ULMFiT paper?

Could anyone please help me how I can get the other same data set (Yelp-bi Yelp-full DBpedia AG and TREC-6) as the paper( Universal Language Model Fine-tuning for Text Classification) , so that I want to reproduce the result. These data sets seem to have been carefully selected by the previous researchers. Thanks.

Good luck… I’d be interested in seeing if you’re able to reproduce the results. Please post once you’ve tried.

I found https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M. Most of dataset can be found there and they are reformatted to csv for using easily.
For now, I have tried to reproduce the result using the jupyter notebook code in lesson 10 (I can’t find the experiment code in that paper). I can’t reproduce all results, probably because of the hyper-premeters and decrement of batch size (my GPU memory limit).

1 Like

Yup that’s the right source. The imdb_scripts folder in the dl2 course repo has the scripts we used, FYI.

have u find the trec-6 dataset? it’s really hard to find a csv version~