Lesson3-imdb - Loading Databunch fails

Finally I found the solution. It’s found the keyword ‘path’ at load_data() function cannot be changed. I need to run as below. I also give the context of all of my scripts at this section so that other people may need it.

path = LM_PATH

data_lm = TextLMDataBunch.from_csv(path, 'all_texts.csv', text_cols=1, label_cols=0)

data_lm.save(data_lm_export.pkl)

data_lm = load_data(path, fname='data_lm.pkl')

Basically, we should assign the value of path before use it. We cannot set it at the function load_data.

1 Like

For anyone finding this thread down the road, “fname=” didn’t seem to work for me anymore, but just passing the filename as the second argument (after path) should still work.

Tip for others who may be running into this. You do need to check that your fastai version is up to date, e.g. use pip list | grep fastai to list the version you have installed. In my case it was outdated, and I had to run pip install --upgrade fastai to get the latest version.

Hi,
I have a problem with the file data_save.pkl file that does not appear. But it is file appear on notebook https://github.com/fastai/course-nlp/blob/master/8-translation-transformer.ipynb. Someone can say me what happens.

hi,I think this link https://s0docs0fast0ai.icopy.site/basic_data.html#load_datamight illutrate load_data is a classmathod of DataBunch,what do you think?:grin:

If the load() method of TextDataBunch is deprecated, and TextClasDataBunch is a subclass of TextDataBunch,TextClasDataBunch.load(path)should also be deprecated,and it turns out right.
if I run TextClasDataBunch.load(path),it will also cause error “FileNotFoundError: [Errno 2] No such file or directory: ‘/root/.fastai/data/imdb_sample/tmp/itos.pkl’”
I wanna ask if there is also a substitute for “TextClasDataBunch.load(path)”

Use following to save
data_lm.save()

Then load as
data_lm = load_data(path)

Using fastai version 1.0.60 and I’m still getting the same itos.pkl FileNotFoundError. Here’s the complete error message:

$ python train.py 
This is a extremely well-made film. The acting, script and camera-work are all first-rate....
Traceback (most recent call last):                                                                                               
File "train.py", line 17, in <module>
  data = TextClasDataBunch.load(path)
  File "/home/cosimo/src/fastai/fastai/.venv/lib/python3.7/site-packages/fastai/text/data.py", line 170, in load
    vocab = Vocab(pickle.load(open(cache_path/'itos.pkl','rb')))
FileNotFoundError: [Errno 2] No such file or directory: '/home/cosimo/.fastai/data/imdb_sample/tmp/itos.pkl'

and here’s the code I’m running:

$ cat train.py
#!/usr/bin/env python

from fastai import *
from fastai.text import *

path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

df = pd.read_csv(path/'texts.csv')
df.head()

print(df['text'][1])

# I already saved the model, so no need to do that again
#data_lm = TextClasDataBunch.from_csv(path, 'texts.csv')
#data_lm.save()

# This line fails because the int-to-string vocabulary (itos.pkl) does not exist on disk
data = TextClasDataBunch.load(path)
data.show_batch()

I tried to track down this error, and I believe it is caused by a bunch of files missing from the imdb-sample dataset (?) or I need to download another dataset which is the actual pre-trained language model (?).

I see files in courses/dl2/imdb_scripts/* that look like scripts used to generate the pre-trained model. I tried to use them, but they in turn require other files that I haven’t found how to generate.

Will try to figure this out further, but it might be easier to just grab the “pre-trained model” files from some notebook directory, if I understood correctly.

Meanwhile, if anybody has ideas here, by all means let me know. Thanks! :slight_smile:

Hey jaidisido, i know its a bit late from the time of the course, when using the following line :
data = TextDataBunch.load(path)

I have the following outcome:
FileNotFoundError: [Errno 2] No such file or directory: ‘/root/.fastai/data/imdb_sample/tmp/itos.pkl’

Can you share with me how you resolved this problem?

Hi,

. I have created a custom databunch which I am trying to load using load_data. But I am getting an attribute error -

File “/home/views.py”, line 641, in get
path, r"/home/data_save.pkl")
File “/usr/local/lib/python3.7/site-packages/fastai/basic_data.py”, line 281, in load_data
ll = torch.load(source, map_location=‘cpu’) if defaults.device == torch.device(‘cpu’) else torch.load(source)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 702, in _legacy_load
result = unpickler.load()
AttributeError: Can’t get attribute ‘RobertaTextList’ on <module ’ main ’ from ‘manage.py’>

The RobertaTextList has been defined in the program but I am still getting the error.

Maybe I have to define this function or import it in the context that I’m loading the databunch. But I don’t know how.

This is the code -

 path = Path()
# Loading the databunch
data = load_data(path, r"data_save.pkl")
roberta_model = CustomRobertaModel()
learn = Learner(data, roberta_model, metrics=[accuracy])
st2 = torch.load(r"final_model_base.pth", map_location=torch.device('cpu'))
learn.model.state_dict(st2)

Can anyone help me with this?

Only this line of code worked for me

data_lm = load_data(path, ‘data_save.pkl’)

Quick tip: I went to terminal and checked the files/folders in path ‘/Users/kopal/.fastai/data/imdb_sample/’ and I found ‘data_save.pkl’ and ‘texts.csv’. load_data worked with .pkl file and I was good to go.