Platform: Paperspace (Free; Paid options)

Hi @dkobran and @tomg . I am using a paid subscription to paper space and I am getting an error when running the following block of code in Notebook 10.

dlf_lm = DataBlock(
blocks = TextBlock.from_folder(path, is_lm = True),
get_items = get_imdb, spliter =
RandomSplitter(0.1)).dataloaders(path,path=path,bs=128,seq_len=80)

I also encountered a similar error I posted about here

Can you please advise?

This looks like a question for @sgugger. /storage/data is where the public datasets consumed by your fastai notebooks live. I’m not aware of any dataset called ‘imdb_tok’, just imdb. Sylvain, can you shed some light on this?

@akashpalrecha Sorry about the delay. You could just run a sidecar Job (or Core VM) though I think the best option is probably to launch TensorBoard directly in a notebook:
https://medium.com/hackernoon/how-to-run-tensorboard-for-pytorch-1-1-0-inside-jupyter-notebook-cf6232498a8d. I have never done this but it looks pretty straightforward. If you try it out, let us know how it goes!

imdb_tok is autogenerated by the tokenizer for the imdb set

As per the above, this looks like the output was never generated.

The storage/data/ does not have write access right? Or does it?

I see that the folder does get created but not the pkl file

@tomg Any suggestions on how I should proceed? It seems to me link I will face this issue any time I am using PaperSpace for NLP related tasks.

Hi, this doesn’t appear to be a Paperspace specific issue. My fastai notebook generated two pkl files without issue.

:/notebooks# ll storage/data/imdb_tok/
total 8040
drwxr-xr-x 7 root root    4096 Mar 18 05:34 ./
drwxr-xr-x 3 root root    4096 May 23 20:19 ../
-rw-r--r-- 1 root root 3649850 Mar 18 05:34 counter.pkl
-rw-r--r-- 1 root root 3070555 Mar 18 05:34 lengths.pkl
drwxr-xr-x 4 root root    4096 Mar 18 05:26 test/
drwxr-xr-x 2 root root    4096 Mar 18 05:26 tmp_clas/
drwxr-xr-x 2 root root    4096 Mar 18 05:26 tmp_lm/
drwxr-xr-x 4 root root    4096 Mar 18 05:26 train/
drwxr-xr-x 2 root root 1478656 Mar 18 05:34 unsup/

To clarify, /storage/data is fully writeable, just the specific preopulated dateset directories are not (for eg. storage/data/imdb). This seems that whatever operation you are running is failing to properly complete – for help with that I ask you direct your question to the fastai course proctors.

Hope this helps,

–Tom

It has full write access. Only the specific dataset sub-directories are read-only, such as storage/data/imdb, etc.

I tried running that block of code after deleting the imdb_tok folder. That seems to have done the trick.

4 Likes

Okay, then it’s probably you had run it once with an older version of fastai2 and it didn’t have all the expected files I’d guess.

Hello. How to update fastai2 in Paperspace?

I’m not talking about running pip install --upgrade fastai2 in a terminal in Paperspace.
Why? Because the pip version of fastai2 is not regularly updated (which is normal).

In an ubuntu installation, it is easy: git pull in the fastai2 folder :slight_smile:
In Paperspace?

Thanks for the tip!
I had earlier tried it before posting on the forums but it gave me the same nginx server error.
It’s weird that there seems to be no way to run tensorboard on paperspace gradient notebooks.
Meanwhile I’m switching to wandb for my logging needs!

Is it just me or does it take forever for a notebook on paperspace to open up? I typically have to wait 10 -15 mins when it used to be instantaneous.

1 Like

I have noticed this as well. It doesn’t seem to make much difference whether it’s a paid or a free runtime, either.

UPDATE: I got a message back from Paperspace support, saying:

" Thanks for contacting Paperspace. It looks like you are currently running the notebook ’ REFERENCENUMBER. ’ Please make sure to store all of your files within the /storage folder. This alone will improve your provisioning times and decrease any chance of errors.

I haven’t fully figured out how to do this yet, or whether it’s even advised given the course’s instructions, but in case this is useful for someone else, I’m posting the message here. I removed the reference number as it was only specific to me.

UPDATE2: what I did was to move the fastbook folder and the course-v4 folder into the storage folder. You can do this with the GUI. This seems to result in significantly faster load times when starting up the server. YMMV.

Further UPDATE: I returned to my Paperspace machine today, and all the folders with all my work were gone. there was no fastbook folder, and no course-v4. I hadn’t done much, so it wasn’t too big a deal, but I think the issue here was the moving things into the storage folder. I’m not sure I’d follow the advice from the support email above any more, given this experience, at least until they explain a bit more what happened.

I have problem with Paperspace. From book page 44 is simple IMDB sample. When text_classifier_learner(…) is run, I get always error. Index out of range. Other small examples from books chapter 1 worked. I have not done any updates or special steps. Just create new notebook (and used Paperspace fastai v4 premade gradient environment).

Should I do some steps after new notebook to get text_classifier_learner to work with IMDB dataset?

Hey there did you a fix for this problem?

I’m using paperspace gradient with free-gpu. I’m on Lesson 8. (notebook 10_nlp)
When i run this:

I get this error:
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/data/imdb_tok/counter.pkl’

I could not find counter.pkl in any of the folders yet. How can this be solved?