ImportError: cannot import name 'TextDataset'


(Amogh Mishra) #1

I am trying to learn ULMFit and I have been following the imdb script. But I am unable to import Text Dataset.

Code:

from fastai.text import TextDataset
trn_ds = TextDataset(trn_clas, trn_labels)
val_ds = TextDataset(val_clas, val_labels)
trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))
trn_dl = DataLoader(trn_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=trn_samp)
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp)
md = ModelData(PATH, trn_dl, val_dl)

Output:


#2

You are probably using fastai v1. those scripts require fastai v0.7, they haven’t been ported to v1 yet.


(Amogh Mishra) #3

Thank you! I didn’t realise that.


(Robyn Speer) #4

Lots of discussion still refers to TextDataset. What has it been replaced by?

I’m looking for a way to train a large language model without having to keep all of the token IDs in RAM.