prosti
December 13, 2019, 4:38pm
1
How would you create TextDataBunch
from .pkl
and .pth
files?
from fastai.text import *
path = untar_data(URLs.WT103_FWD)
This gave me .pkl
and .pth
files.
And I would like to have data_lm
for:
l = language_model_learner(data_lm, arch=AWD_LSTM, pretrained_fnames=['lstm_fwd.pth','itos_wt103.pkl'], drop_mult=0.3)
What should be my data_lm
?
prosti
December 13, 2019, 5:47pm
2
Another possible problem, inside URLs.WT103_FWD
.pkl
file was exported using pickle.dump
.
path = untar_data(URLs.WT103_FWD)
path.ls()
# [PosixPath('/content/data/wt103-fwd/itos_wt103.pkl'),
# PosixPath('/content/data/wt103-fwd/lstm_fwd.pth')]
itos = pickle.load(open( '/content/data/wt103-fwd/itos_wt103.pkl', 'rb')) # no problem
learn = load_learner(path,'/content/data/wt103-fwd/itos_wt103.pkl') # RuntimeError: Invalid magic number; corrupt file?
It works when I pickle.load
it, but load learner shows Invalid magic
.
Assuming could be torch.save
instead of pickle.dump
problem when creating .pkl
.
This doesn’t add much to my original post but may be important.
sgugger
December 13, 2019, 6:12pm
3
This is the URL of a model, not a dataset.
prosti
December 14, 2019, 10:37am
4
Thanks @sgugger , so you are kind suggesting I cannot create a dataset based on URLs.WT103_FWD
, because this is URL of the model.
URLs.IMDB
would be example of the URL of the dataset.
Fastai does or does not have URL of WT103 as a dataset?
Could I use the URL like this one: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip for the original post question?
My general goal is predicting the next word. I would like to use the predict method from fastai.
I think what your looking for is untar_data(URLs.WIKITEXT)
, I don’t know why would you need this anyway because you already got the pretrained weights for the language model. If you want to retrain the model yourself from scratch you can refer this notebook .
1 Like