Error Intro Chapter 1

when I run this code in https://n5kk1g6b.gradient.paperspace.com/notebooks/course-v4/nbs/01_intro.ipynb

from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

I get

FileNotFoundError: [Errno 2] No such file or directory: '/storage/data/imdb_tok/counter.pkl'

Has anyone seen this problem? I checked in the forum, there is a question but I did not see a response. I am using the Gradient Free-P5000

Complete error

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-3-5ab79cd5e866> in <module>
      1 from fastai.text.all import *
      2 
----> 3 dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
      4 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
      5 learn.fine_tune(4, 1e-2)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/data.py in from_folder(cls, path, train, valid, valid_pct, seed, vocab, text_vocab, is_lm, tok_tfm, seq_len, backwards, **kwargs)
    222         "Create from imagenet style dataset in `path` with `train` and `valid` subfolders (or provide `valid_pct`)"
    223         splitter = GrandparentSplitter(train_name=train, valid_name=valid) if valid_pct is None else RandomSplitter(valid_pct, seed=seed)
--> 224         blocks = [TextBlock.from_folder(path, text_vocab, is_lm, seq_len, backwards) if tok_tfm is None else TextBlock(tok_tfm, text_vocab, is_lm, seq_len, backwards)]
    225         if not is_lm: blocks.append(CategoryBlock(vocab=vocab))
    226         get_items = partial(get_text_files, folders=[train,valid]) if valid_pct is None else get_text_files

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/data.py in from_folder(cls, path, vocab, is_lm, seq_len, backwards, min_freq, max_vocab, **kwargs)
    210     def from_folder(cls, path, vocab=None, is_lm=False, seq_len=72, backwards=False, min_freq=3, max_vocab=60000, **kwargs):
    211         "Build a `TextBlock` from a `path`"
--> 212         return cls(Tokenizer.from_folder(path, **kwargs), vocab=vocab, is_lm=is_lm, seq_len=seq_len,
    213                    backwards=backwards, min_freq=min_freq, max_vocab=max_vocab)
    214 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/core.py in from_folder(cls, path, tok, rules, **kwargs)
    276         if tok is None: tok = WordTokenizer()
    277         output_dir = tokenize_folder(path, tok=tok, rules=rules, **kwargs)
--> 278         res = cls(tok, counter=(output_dir/fn_counter_pkl).load(),
    279                   lengths=(output_dir/fn_lengths_pkl).load(), rules=rules, mode='folder')
    280         res.path,res.output_dir = path,output_dir

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastcore/utils.py in load(fn)
    522 def load(fn:Path):
    523     "Load a pickle file from a file name or opened file"
--> 524     if not isinstance(fn, io.IOBase): fn = open(fn,'rb')
    525     try: return pickle.load(fn)
    526     finally: fn.close()

FileNotFoundError: [Errno 2] No such file or directory: '/storage/data/imdb_tok/counter.pkl'

Try writing the path in separate line and perform commands on the path.

path= untar_data(URLs.IMDB)

and perform cmds like
path.ls()
to check the path of storage and etc.

1 Like

Did you figure out what caused the error? I am getting exactly the same one, also on different machines.

Tried out what @SamJoel proposed performing path.ls() and it returns an array with 7 Path objects

Yes, I did get it working by going into the terminal in Paperspace and deleting the folder imdb_tok completely.

2 Likes

Worked a treat. Thanks!!!

The issue was like this i have interrupted the cell on running it first time.
so the imdb_tok has been downloaded already.

Hence, to run the cell again you have to delete imdb_tok folder or just rename it , to get the things running.

you should also be able to do: path = untar_data(URLs.IMDB, force_download=True)

I split cells: run
source = untar_data(URLs.IMDB)
then
print(source)
Output: /storage/data/imdb
The
from fastai.text.all import *
dls = TextDataLoaders.from_folder(source, valid=‘test’, bs=32)
fails with the same error as Rajeev’s:
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’
However, running
source.ls()
gives
(#7) [Path(’/storage/data/imdb/README’),Path(’/storage/data/imdb/imdb.vocab’),Path(’/storage/data/imdb/tmp_lm’),Path(’/storage/data/imdb/tmp_clas’),Path(’/storage/data/imdb/unsup’),Path(’/storage/data/imdb/test’),Path(’/storage/data/imdb/train’)]
so there is no imdb_tok directory.
By the way, how can I get terminal in the paperspace?
Thanks in advance.

@yurirzhanov here is the screenshot

Thank you. But I work on Paperspace. There is no such a dropdown menu. Anyway, I do not have ‘imdb_tok’

directory that you deleted. Only ‘imdb’.

@yurirzhanov - Can you share the steps for how you open this particular notebook in Paperspace? I am using the same platform. If I know the steps you follow to access your notebook then may be I can help.

Here is the sequence of actions:

Log in.
Choose Gradient (not Core)
Choose Jupyter: Run a sample notebook
Choose Paperspace + Fast.ai
Choose fastbook
Open 01_intro
Run
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

Run
from fastbook import *

Run
#id first training
Runs fine

Run uploader/custom classification
Runs correctly

Run CAMVID_TINY
Runs fine

Run
print(URLs.IMDB)
Output
https://s3.amazonaws.com/fast-ai-nlp/imdb.tgz

Run
source = untar_data(URLs.IMDB)
print(source)
Output
/storage/data/imdb

Run
from fastai.text.all import *
dls = TextDataLoaders.from_folder(source, valid=‘test’, bs=32)
Error
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’

Run
source.ls()
Output
(#7) [Path(’/storage/data/imdb/README’),Path(’/storage/data/imdb/imdb.vocab’),Path(’/storage/data/imdb/tmp_lm’),Path(’/storage/data/imdb/tmp_clas’),Path(’/storage/data/imdb/unsup’),Path(’/storage/data/imdb/test’),Path(’/storage/data/imdb/train’)]

Here are some screenshots.

  1. Login and click Gradient as you have done. That should bring you to the following screen. Click on the start button

  2. Once the running sign (in green is on). Click on Open V2 beta

  3. That should bring you to the following screen

Now you can follow the instructions I provided earlier to open the terminal by clicking on the ‘New’ dropdown on top right below the log out button

I hope this helps.

Thank you so much. It worked (sort of - I run out of CUDA memory, but it’s a different and understandable matter). But just imagine how frustrating it was for me to start repeating exercises and get stuck in intro for unknown reason…

No problem at all. Glad I could help. I understand the frustration. Hopefully others can find and use this post to get unstuck.