you should also be able to do: path = untar_data(URLs.IMDB, force_download=True)
I split cells: run
source = untar_data(URLs.IMDB)
then
print(source)
Output: /storage/data/imdb
The
from fastai.text.all import *
dls = TextDataLoaders.from_folder(source, valid=‘test’, bs=32)
fails with the same error as Rajeev’s:
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’
However, running
source.ls()
gives
(#7) [Path(’/storage/data/imdb/README’),Path(’/storage/data/imdb/imdb.vocab’),Path(’/storage/data/imdb/tmp_lm’),Path(’/storage/data/imdb/tmp_clas’),Path(’/storage/data/imdb/unsup’),Path(’/storage/data/imdb/test’),Path(’/storage/data/imdb/train’)]
so there is no imdb_tok directory.
By the way, how can I get terminal in the paperspace?
Thanks in advance.
Thank you. But I work on Paperspace. There is no such a dropdown menu. Anyway, I do not have ‘imdb_tok’
directory that you deleted. Only ‘imdb’.@yurirzhanov - Can you share the steps for how you open this particular notebook in Paperspace? I am using the same platform. If I know the steps you follow to access your notebook then may be I can help.
Here is the sequence of actions:
Log in.
Choose Gradient (not Core)
Choose Jupyter: Run a sample notebook
Choose Paperspace + Fast.ai
Choose fastbook
Open 01_intro
Run
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
Run
from fastbook import *
Run
#id first training
Runs fine
Run uploader/custom classification
Runs correctly
Run CAMVID_TINY
Runs fine
Run
print(URLs.IMDB)
Output
https://s3.amazonaws.com/fast-ai-nlp/imdb.tgz
Run
source = untar_data(URLs.IMDB)
print(source)
Output
/storage/data/imdb
Run
from fastai.text.all import *
dls = TextDataLoaders.from_folder(source, valid=‘test’, bs=32)
Error
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’
Run
source.ls()
Output
(#7) [Path(’/storage/data/imdb/README’),Path(’/storage/data/imdb/imdb.vocab’),Path(’/storage/data/imdb/tmp_lm’),Path(’/storage/data/imdb/tmp_clas’),Path(’/storage/data/imdb/unsup’),Path(’/storage/data/imdb/test’),Path(’/storage/data/imdb/train’)]
Here are some screenshots.
-
Login and click Gradient as you have done. That should bring you to the following screen. Click on the start button
-
Once the running sign (in green is on). Click on
Open V2 beta
-
That should bring you to the following screen
Now you can follow the instructions I provided earlier to open the terminal by clicking on the ‘New’ dropdown on top right below the log out button
I hope this helps.
Thank you so much. It worked (sort of - I run out of CUDA memory, but it’s a different and understandable matter). But just imagine how frustrating it was for me to start repeating exercises and get stuck in intro for unknown reason…
No problem at all. Glad I could help. I understand the frustration. Hopefully others can find and use this post to get unstuck.
this dint work
Hi,
I found that this worked for me: https://stackoverflow.com/questions/303200/how-do-i-remove-delete-a-folder-that-is-not-empty/
Hope this helps, almost lost my mind on this one lol.
works like charm! thanks
I am also running out of memory. I changed batch size to 64 ( bs=64 ) and changes sequence length to 40 ( seq_len=40 ). It seems to be running now. @yurirzhanov, how did you fix the out-of-memory error? or where did you look to find the solution?
to remove the imdb_tok directory in Paperspace you can open a new cell and you should be able to type this unix code in to remove it:
rmdir -r /storage/data/imdb_tok
It worked for me.
here is a picture with the ls in case it helps
I had the same problem. Your hint helped me to fix it. Thanks.
Receiving the same error in 2022 on a Windows machine.
Here are the steps that fixed it for me:
- Delete the IMDB/IMDB_tok folder located at
~\.fastai\data\imdb_tok
- Restart the kernel and run the following cells one by one:
import fastbook
fastbook.setup_book()
from fastai.text.all import *
path= untar_data(URLs.IMDB)
get_imdb = partial(get_text_files, folders=['train', 'test', 'unsup'])
dls_lm = DataBlock(
blocks=TextBlock.from_folder(path, is_lm=True, n_workers=0),
get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=16, n_workers=0)
Please note this last execution took about 4-5 minutes to finish but it finally grabbed all the elements needed and put into the IMDB_Tok folder ( including counter.pkl )
- Restart the kernel
- Run the original cell as demonstrated in the notebook:
from fastai.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=32)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)
Hope this helps anyone running into the same issue