There are lots of docs for how to deal with the smaller IMDB dataset behind URLs.IMDB_SAMPLE
, but the larger dataset (URLS.IMDB
) is structured differently.
It looks like this:
/tmp/aclImdb
├── imdb.vocab
├── imdbEr.txt
├── test
│ ├── neg # .txt files under here
│ ├── pos # .txt files under here
└── train
├── neg # .txt files under here
├── pos # .txt files under here