Lesson3-imdb - Loading Databunch fails

Hi,

I am trying to run the lesson3-imdb jupyter notebook on Sagemaker. AIl cells run well except for all those involving loading a Databunch object that was previously saved.

Cell Examples include:
data = TextDataBunch.load(path)
or
data_lm = TextLMDataBunch.load(path, ‘tmp_lm’, bs=bs)

They all fail with something similar to:
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/ec2-user/.fastai/data/imdb_sample/tmp/itos.pkl’
or
NotADirectoryError: [Errno 20] Not a directory: ‘/home/ec2-user/.fastai/data/imdb/tmp_lm/itos.pkl’

It seems like the “.save” methods preceding the loading cells do not produce the outputs expected.
For instance, the:
data_lm.save(‘tmp_lm’) produces a file called ‘tmp_lm’ and not a directory

which leads to the error:
NotADirectoryError: [Errno 20] Not a directory: ‘/home/ec2-user/.fastai/data/imdb/tmp_lm/itos.pkl’

Anyone else encountering these errors please?

Thank you

See my response in this thread:

4 Likes

@neuradai you are a life saver, it worked thank you! I was starting to be worried that I had run an ec2 for 48h for nothing!

1 Like

Hello,

I am getting the similar error reported and tried the fix suggested. I am getting the error below-FileNotFoundError: [Errno 2] No such file or directory: '/home/nbuser/courses/fast-ai/course-v3/nbs/data/imdb_sample/tmp/itos.pkl.

I checked that no folder “tmp” gets created on SAVE.
Code updated:
#data_lm = TextDataBunch.from_csv(path, ‘texts.csv’)
data_lm = load_data(path, bs=8)

Please advise.

Hi @Devv - try specifying the name of the file you wish to load. In my case:
data_lm = load_data(path, ‘tmp_lm’, bs=bs)

Hi,
@jaidisido thanks for your feedback. But I think the issue is that the method is expecting a pickle file in /tmp/ folder called itos.pkl. It does not get created but there is a pickle file called data_save.pkl. It seems its expecting a different pickle file.

So my new code is;
data_lm = load_data(path, ‘data_save.pkl’). It works for now.
data_lm.save()
data = TextDataBunch.load(path)
This throws an error "FileNotFoundError: [Errno 2] No such file or directory: '/home/nbuser/courses/fast-ai/course-v3/nbs/data/imdb_sample/tmp/itos.pkl’.

I re-downloaded fastai libs and pytorch version couple of days ago but it seems that it did not fix the issue.

Please advise.

The load method of the TextDataBunch class and its descendants is deprecated in the latest version of fastai. The new save method “pickles” the DataBunch for which it is called. Therefore, the 2nd and 3rd lines of code you’re showing above are unnecessary. All you need is load_data and data_lm is set to go.

2 Likes

I’m running through the imdb notebook and the load_data method is not defined. Any idea what’s going on?

load_data(path, ‘data_lm.pkl’, bs=bs)
data_lm = load_data(path, ‘data_lm.pkl’, bs=bs)

NameError Traceback (most recent call last)
in ()
----> 1 data_lm = load_data(path, ‘data_lm.pkl’, bs=bs)

NameError: name ‘load_data’ is not defined

@bbiseda : As @neuradai suggested, TextDataBunch class and its descendants is deprecated so you need to update your fastai package via conda/pip and course notebook files via git pull. This should resolve your issue:

NameError: name ‘load_data’ is not defined

Please see Returning to Work Section in course v3 documentation for your platform or choice, for example, I am using AWS so i am following this guide
https://course.fast.ai/update_aws.html

1 Like

Thank you. This resolved the issue.

1 Like

Any time :slight_smile:

What import do I need for my jupyter notebook to find load_data? I have updated my fastai library, done a git pull, restarted my server, and it’s still not able to call load_data

1 Like

Perhaps you might want to check which version of FastAI are you on to ensure that you have updated to the lastest version which as of now is 1.0.48

Are you on GCP? If you do, I had the same issue which I resolved by following the instructions here.

1 Like

Need to use the following for GCP to work.

sudo /opt/anaconda3/bin/conda install fastai -c fastai -c pytorch -c conda-forge

2 Likes

Thank you @coral! My problem was that fastai wasn’t updating to the latest version, as you predicted.

thanks for the note… by the way, could you help me 'Like" the answer, as I’m trying to collect likes… to get into FastAI round 2

2 Likes

@coral
I also an issue when I ran load_data. My fastai is the latest version. Could you give me a hint too?

conda list fastai

Name Version Build Channel

fastai 1.0.50.post1 1 fastai

Below is the error details.
data_lm.save(LM_PATH/‘data_lm.pkl’)
data_lm.load_data(LM_PATH, ‘data_lm.pkl’, bs=bs)

AttributeError Traceback (most recent call last)
in
----> 1 data_lm.load_data(LM_PATH)

/home/application/anaconda/lib/python3.7/site-packages/fastai/basic_data.py in getattr(self, k)
120 return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)
121
–> 122 def getattr(self,k:int)->Any: return getattr(self.train_dl, k)
123 def setstate(self,data:Any): self.dict.update(data)
124

/home/application/anaconda/lib/python3.7/site-packages/fastai/basic_data.py in getattr(self, k)
36
37 def len(self)->int: return len(self.dl)
—> 38 def getattr(self,k:str)->Any: return getattr(self.dl, k)
39 def setstate(self,data:Any): self.dict.update(data)
40

/home/application/anaconda/lib/python3.7/site-packages/fastai/basic_data.py in DataLoader___getattr__(dl, k)
18 torch.utils.data.DataLoader.init = intercept_args
19
—> 20 def DataLoader___getattr__(dl, k:str)->Any: return getattr(dl.dataset, k)
21 DataLoader.getattr = DataLoader___getattr__
22

/home/application/anaconda/lib/python3.7/site-packages/fastai/data_block.py in getattr(self, k)
624 res = getattr(y, k, None)
625 if res is not None: return res
–> 626 raise AttributeError(k)
627
628 def setstate(self,data:Any): self.dict.update(data)

AttributeError: load_data

LM_PATH is a PosixPath.
LM_PATH
PosixPath(’/home/workspace/nlpproject/data/lm’)

Try this:

data_lm.save(fname='data_lm.pkl')
data_lm = load_data(LM_PATH, fname='data_lm.pkl', bs=bs)
1 Like

I still have the same error. After run data_lm.save(fname='data_lm.pkl'), I saw file data_lm.pkl is saved under data/lm/. But load_data still have the same error output. I also tried data_lm.load_data(path=LM_PATH, fname='data_lm.pkl', bs=bs), but the error is consistent.

please post your detailed error message