Create TextDataBunch from pkl and pth

How would you create TextDataBunch from .pkl and .pth files?

from fastai.text import *
path = untar_data(URLs.WT103_FWD)

This gave me .pkl and .pth files.

And I would like to have data_lm for:

l = language_model_learner(data_lm,  arch=AWD_LSTM, pretrained_fnames=['lstm_fwd.pth','itos_wt103.pkl'], drop_mult=0.3)

What should be my data_lm?

Another possible problem, inside URLs.WT103_FWD .pkl file was exported using pickle.dump.

path = untar_data(URLs.WT103_FWD)
path.ls()
# [PosixPath('/content/data/wt103-fwd/itos_wt103.pkl'),
# PosixPath('/content/data/wt103-fwd/lstm_fwd.pth')]
itos = pickle.load(open( '/content/data/wt103-fwd/itos_wt103.pkl', 'rb')) # no problem
learn = load_learner(path,'/content/data/wt103-fwd/itos_wt103.pkl') # RuntimeError: Invalid magic number; corrupt file?

It works when I pickle.load it, but load learner shows Invalid magic.

Assuming could be torch.save instead of pickle.dump problem when creating .pkl.

This doesn’t add much to my original post but may be important.

This is the URL of a model, not a dataset.

Thanks @sgugger, so you are kind suggesting I cannot create a dataset based on URLs.WT103_FWD, because this is URL of the model.

URLs.IMDB would be example of the URL of the dataset.
Fastai does or does not have URL of WT103 as a dataset?

Could I use the URL like this one: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip for the original post question?

My general goal is predicting the next word. I would like to use the predict method from fastai.

I think what your looking for is untar_data(URLs.WIKITEXT), I don’t know why would you need this anyway because you already got the pretrained weights for the language model. If you want to retrain the model yourself from scratch you can refer this notebook.

1 Like