How to get the other datasets in ULMFiT paper?

michael_c · October 10, 2018, 2:09am

Could anyone please help me how I can get the other same data set (Yelp-bi Yelp-full DBpedia AG and TREC-6) as the paper( Universal Language Model Fine-tuning for Text Classification) , so that I want to reproduce the result. These data sets seem to have been carefully selected by the previous researchers. Thanks.

DavidBressler · October 12, 2018, 12:52am

Good luck… I’d be interested in seeing if you’re able to reproduce the results. Please post once you’ve tried.

michael_c · October 12, 2018, 3:18am

I found https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M. Most of dataset can be found there and they are reformatted to csv for using easily.
For now, I have tried to reproduce the result using the jupyter notebook code in lesson 10 (I can’t find the experiment code in that paper). I can’t reproduce all results, probably because of the hyper-premeters and decrement of batch size (my GPU memory limit).

jeremy · October 12, 2018, 5:15am

Yup that’s the right source. The imdb_scripts folder in the dl2 course repo has the scripts we used, FYI.

DeepX · October 6, 2020, 3:57pm

have u find the trec-6 dataset? it’s really hard to find a csv version~