Lesson3-imdb - Loading Databunch fails

peppa · March 21, 2019, 6:36pm

Finally I found the solution. It’s found the keyword ‘path’ at load_data() function cannot be changed. I need to run as below. I also give the context of all of my scripts at this section so that other people may need it.

path = LM_PATH

data_lm = TextLMDataBunch.from_csv(path, 'all_texts.csv', text_cols=1, label_cols=0)

data_lm.save(data_lm_export.pkl)

data_lm = load_data(path, fname='data_lm.pkl')

Basically, we should assign the value of path before use it. We cannot set it at the function load_data.

leviritchie · May 15, 2019, 4:55pm

For anyone finding this thread down the road, “fname=” didn’t seem to work for me anymore, but just passing the filename as the second argument (after path) should still work.

tjaffri · September 22, 2019, 1:13am

Tip for others who may be running into this. You do need to check that your fastai version is up to date, e.g. use pip list | grep fastai to list the version you have installed. In my case it was outdated, and I had to run pip install --upgrade fastai to get the latest version.

C-L-Avila · September 25, 2019, 8:39am

Hi,
I have a problem with the file data_save.pkl file that does not appear. But it is file appear on notebook https://github.com/fastai/course-nlp/blob/master/8-translation-transformer.ipynb. Someone can say me what happens.

AdvancingCat · October 1, 2019, 12:45am

hi,I think this link https://s0docs0fast0ai.icopy.site/basic_data.html#load_datamight illutrate load_data is a classmathod of DataBunch,what do you think?

AdvancingCat · October 1, 2019, 1:54am

If the load() method of TextDataBunch is deprecated, and TextClasDataBunch is a subclass of TextDataBunch,TextClasDataBunch.load(path)should also be deprecated,and it turns out right.
if I run TextClasDataBunch.load(path),it will also cause error “FileNotFoundError: [Errno 2] No such file or directory: ‘/root/.fastai/data/imdb_sample/tmp/itos.pkl’”
I wanna ask if there is also a substitute for “TextClasDataBunch.load(path)”

diamondspark · January 15, 2020, 12:58am

Use following to save
data_lm.save()

Then load as
data_lm = load_data(path)

uw198162 · February 20, 2020, 4:29pm

Using fastai version 1.0.60 and I’m still getting the same itos.pkl FileNotFoundError. Here’s the complete error message:

$ python train.py 
This is a extremely well-made film. The acting, script and camera-work are all first-rate....
Traceback (most recent call last):                                                                                               
File "train.py", line 17, in <module>
  data = TextClasDataBunch.load(path)
  File "/home/cosimo/src/fastai/fastai/.venv/lib/python3.7/site-packages/fastai/text/data.py", line 170, in load
    vocab = Vocab(pickle.load(open(cache_path/'itos.pkl','rb')))
FileNotFoundError: [Errno 2] No such file or directory: '/home/cosimo/.fastai/data/imdb_sample/tmp/itos.pkl'

and here’s the code I’m running:

$ cat train.py
#!/usr/bin/env python

from fastai import *
from fastai.text import *

path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

df = pd.read_csv(path/'texts.csv')
df.head()

print(df['text'][1])

# I already saved the model, so no need to do that again
#data_lm = TextClasDataBunch.from_csv(path, 'texts.csv')
#data_lm.save()

# This line fails because the int-to-string vocabulary (itos.pkl) does not exist on disk
data = TextClasDataBunch.load(path)
data.show_batch()

I tried to track down this error, and I believe it is caused by a bunch of files missing from the imdb-sample dataset (?) or I need to download another dataset which is the actual pre-trained language model (?).

I see files in courses/dl2/imdb_scripts/* that look like scripts used to generate the pre-trained model. I tried to use them, but they in turn require other files that I haven’t found how to generate.

Will try to figure this out further, but it might be easier to just grab the “pre-trained model” files from some notebook directory, if I understood correctly.

Meanwhile, if anybody has ideas here, by all means let me know. Thanks!

Mukesh · April 15, 2020, 8:54am

Hey jaidisido, i know its a bit late from the time of the course, when using the following line :
data = TextDataBunch.load(path)

I have the following outcome:
FileNotFoundError: [Errno 2] No such file or directory: ‘/root/.fastai/data/imdb_sample/tmp/itos.pkl’

Can you share with me how you resolved this problem?

kanz.2890 · May 8, 2020, 7:39pm

Hi,

. I have created a custom databunch which I am trying to load using load_data. But I am getting an attribute error -

File “/home/views.py”, line 641, in get
path, r"/home/data_save.pkl")
File “/usr/local/lib/python3.7/site-packages/fastai/basic_data.py”, line 281, in load_data
ll = torch.load(source, map_location=‘cpu’) if defaults.device == torch.device(‘cpu’) else torch.load(source)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 702, in _legacy_load
result = unpickler.load()
AttributeError: Can’t get attribute ‘RobertaTextList’ on <module ’ main ’ from ‘manage.py’>

The RobertaTextList has been defined in the program but I am still getting the error.

Maybe I have to define this function or import it in the context that I’m loading the databunch. But I don’t know how.

This is the code -

 path = Path()
# Loading the databunch
data = load_data(path, r"data_save.pkl")
roberta_model = CustomRobertaModel()
learn = Learner(data, roberta_model, metrics=[accuracy])
st2 = torch.load(r"final_model_base.pth", map_location=torch.device('cpu'))
learn.model.state_dict(st2)

Can anyone help me with this?

kopalsoni · June 28, 2020, 2:37am

Only this line of code worked for me

data_lm = load_data(path, ‘data_save.pkl’)

Quick tip: I went to terminal and checked the files/folders in path ‘/Users/kopal/.fastai/data/imdb_sample/’ and I found ‘data_save.pkl’ and ‘texts.csv’. load_data worked with .pkl file and I was good to go.