Error Intro Chapter 1

you should also be able to do: path = untar_data(URLs.IMDB, force_download=True)

2 Likes

I split cells: run
source = untar_data(URLs.IMDB)
then
print(source)
Output: /storage/data/imdb
The
from fastai.text.all import *
dls = TextDataLoaders.from_folder(source, valid=‘test’, bs=32)
fails with the same error as Rajeev’s:
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’
However, running
source.ls()
gives
(#7) [Path(’/storage/data/imdb/README’),Path(’/storage/data/imdb/imdb.vocab’),Path(’/storage/data/imdb/tmp_lm’),Path(’/storage/data/imdb/tmp_clas’),Path(’/storage/data/imdb/unsup’),Path(’/storage/data/imdb/test’),Path(’/storage/data/imdb/train’)]
so there is no imdb_tok directory.
By the way, how can I get terminal in the paperspace?
Thanks in advance.

@yurirzhanov here is the screenshot

Thank you. But I work on Paperspace. There is no such a dropdown menu. Anyway, I do not have ‘imdb_tok’

directory that you deleted. Only ‘imdb’.

@yurirzhanov - Can you share the steps for how you open this particular notebook in Paperspace? I am using the same platform. If I know the steps you follow to access your notebook then may be I can help.

Here is the sequence of actions:

Log in.
Choose Gradient (not Core)
Choose Jupyter: Run a sample notebook
Choose Paperspace + Fast.ai
Choose fastbook
Open 01_intro
Run
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

Run
from fastbook import *

Run
#id first training
Runs fine

Run uploader/custom classification
Runs correctly

Run CAMVID_TINY
Runs fine

Run
print(URLs.IMDB)
Output
https://s3.amazonaws.com/fast-ai-nlp/imdb.tgz

Run
source = untar_data(URLs.IMDB)
print(source)
Output
/storage/data/imdb

Run
from fastai.text.all import *
dls = TextDataLoaders.from_folder(source, valid=‘test’, bs=32)
Error
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’

Run
source.ls()
Output
(#7) [Path(’/storage/data/imdb/README’),Path(’/storage/data/imdb/imdb.vocab’),Path(’/storage/data/imdb/tmp_lm’),Path(’/storage/data/imdb/tmp_clas’),Path(’/storage/data/imdb/unsup’),Path(’/storage/data/imdb/test’),Path(’/storage/data/imdb/train’)]

Here are some screenshots.

  1. Login and click Gradient as you have done. That should bring you to the following screen. Click on the start button

  2. Once the running sign (in green is on). Click on Open V2 beta

  3. That should bring you to the following screen

Now you can follow the instructions I provided earlier to open the terminal by clicking on the ‘New’ dropdown on top right below the log out button

I hope this helps.

Thank you so much. It worked (sort of - I run out of CUDA memory, but it’s a different and understandable matter). But just imagine how frustrating it was for me to start repeating exercises and get stuck in intro for unknown reason…

No problem at all. Glad I could help. I understand the frustration. Hopefully others can find and use this post to get unstuck.

this dint work

hey how do I delete imdb_tok folder?

Hi,

I found that this worked for me: https://stackoverflow.com/questions/303200/how-do-i-remove-delete-a-folder-that-is-not-empty/

Hope this helps, almost lost my mind on this one lol.

3 Likes

works like charm! thanks

I am also running out of memory. I changed batch size to 64 ( bs=64 ) and changes sequence length to 40 ( seq_len=40 ). It seems to be running now. @yurirzhanov, how did you fix the out-of-memory error? or where did you look to find the solution?

to remove the imdb_tok directory in Paperspace you can open a new cell and you should be able to type this unix code in to remove it:

rmdir -r /storage/data/imdb_tok

It worked for me.

here is a picture with the ls in case it helps
Screen Shot 2021-08-03 at 12.07.59 PM

I had the same problem. Your hint helped me to fix it. Thanks.

Receiving the same error in 2022 on a Windows machine.
Here are the steps that fixed it for me:

  1. Delete the IMDB/IMDB_tok folder located at ~\.fastai\data\imdb_tok
  2. Restart the kernel and run the following cells one by one:
import fastbook
fastbook.setup_book()
from fastai.text.all import *
path= untar_data(URLs.IMDB)
get_imdb = partial(get_text_files, folders=['train', 'test', 'unsup'])
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True, n_workers=0),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=16, n_workers=0)

Please note this last execution took about 4-5 minutes to finish but it finally grabbed all the elements needed and put into the IMDB_Tok folder ( including counter.pkl )

  1. Restart the kernel
  2. Run the original cell as demonstrated in the notebook:
from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=32)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

Hope this helps anyone running into the same issue :slight_smile:

For people on kaggle this solved the issue for me thanks to @daveramseymusic !

1 Like