Lesson 1 In-Class Discussion

I’m having the same problem on paperspace. I set the machine up using curl http://files.fast.ai/setup/paperspace | bash.

Hi all,

I followed the instructions in the video for Paperspace, but was unable to access the Jupyter Notebook externally (I used Ubuntu 16, West Coast, GPU). I tried assigning a static public IP address and configuring Jupyter to serve requests coming from any IP address, but that didn’t work either. I’ve also contacted Paperspace support.

Any ideas?

Thanks,
Rick

Hi Rick

Did you use http://files.fast.ai/setup/paperspace to set it up? Perhaps the firewall isn’t configured to allow 8888?

sudo ufw allow 8888:8898/tcp

2 Likes

I did, thanks Ser @davos; and that worked for me. Thanks!

No, that sounds like it’s not using the GPU. Someone else mentioned they had to conda remove and reinstall pytorch after rebooting.

Feeling pretty good about lesson 1. Here is recap about what I did.

  • Finished watching the lesson
  • Set up my environment (script crashed when installing anaconda but I was able to hack at it till it worked)
  • Made my own breakfast dataset (frenchtoast, pancakes, and waffles)
  • Tried to re-create notebook from memory to make an image classifier for my breakfast dataset. I ended up copying a lot of the lesson’s code but I typed it myself. Copying by typing out the code was where I learned most during the lesson.
  • Found a useful forum post about multi class probabilities and a new class for visualize image model results.

Here are links:
breakfast ipynb. Pictures at the end are pretty interesting. I really surprised that some of the pancakes were not classified correctly. I was also unable to get the Cyclical Learning Rates to work. I think this might be because of a lack of images (50 train, 50 validation of each category) - but any advice here is appreciated.

breakfast dataset

4 Likes

20 images will be sufficient for validation?
(20% of the data)
Also if you have downloaded the images manually, then there’s a helper script in GitHub to automate the job…
google image downloader was the repository name if I am correct…

In case anyone else hit this error:

  • Error: libSM.so.6: cannot open shared object file: No such file or directory
    • Solution: sudo apt-get install -y python-qt4
2 Likes

So you having the same problem as me? Not able to use GPU?

Aye I did but I stepped through the http://files.fast.ai/setup/paperspace script and re-installed a bunch of stuff… something fixed it but not sure exactly what sorry

Ok. manage to solve the GPU not used in pytorch. I install back the cuda driver since I can’t initiate nvidia-smi command in the terminal. not sure why the first bash command doesn’t install it.

100%|██████████| 360/360 [01:29<00:00, 4.02it/s]
100%|██████████| 32/32 [00:07<00:00, 4.03it/s]
Epoch
100% 3/3 [00:14<00:00, 4.80s/it]
[ 0. 0.05418 0.02841 0.98779]
[ 1. 0.05212 0.02917 0.9873 ]
[ 2. 0.03798 0.02918 0.98926]

However 4it/s it’s a bit slow i guess. When query my NVIDA M4000 stats it show that 923MB/8121MB GPU memory usage. Am I not optimize the GPU?

image

Hi, KevinB, I come across the same problem “RuntimeError: CUDNN_STATUS_INTERNAL_ERROR”, how did you solve this? Thanks!

Try deleting your tmp folder. You will need to rerun everything at that point. If I remember correctly, that solved it for me, but I may not…

PaperSpace Setup Problems.

I am having difficulty with the PaperSpace setup. I have followed the setup up on 2 machines which encountered a problem at the same point, but manifested different problems.

On both occasions the install “froze” at the “???-seaborne-???” install step. It reached 100% and the nothing happened. No cursor, no command prompt, nothing. If I leave it eventually the machine goes to sleep and I have to go back to the console and launch the machine again. Refreshing the browser does nothing.
If I install this via a terminal the terminal disconnects with a “broken pipe” error. I also noted it didnt have a data directory installed
If I try and redo the install it fails as it cannot find a directory to remove /etc/???confi.d something or another. Even through I can navigate to it manually.
My only option was the delete the machine and start again.

Created a new machine which got stuck at the same point. Left it alone until it eventually went to sleep. Went back to console and relaunched, this time the data directory was there.
Tried to git pull from root but it didnt recognise the command. Navigated to fastai dir and git pull worked ok.
Tried the update conda and it didnt recognise the command. Navigated to several locations and no luck.
Now the machine has gone to sleep and I had to go the the console and go to machine actions menu to restart.
Either the restart is super slow or doesnt work. Just got a cursor with no command prompt.

Anyone else encounter this? Just noticed I’ve now lost the cursor - the in browser terminal is just blank.

Dont have this issue with AWS…

I met the same problem while running the following codes.

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(resnet34, sz))
learn = ConvLearner.pretrained(resnet34, data, precompute=True)
learn.fit(0.01, 3)

I did the follow steps and it was fixed.

  1. restart the VM
  2. reinstall all necessary libraries
  3. install Pillow 5.0.0 by pip3 install pillow==5.0.0.

Maybe you should try installing Pillow 5.0.0 first.


I have changed the batch size from default to 4, 8, 16 but I am still facing the same problem.
How do I fix it. Please suggest

I am new to Fast AI, when I run lesson I am getting “name ‘resnet34’ is not defined”. I had lot of errors on import so used below one to import

from IPython.lib.deepreload import reload as dreload
import PIL, os, numpy as np, math, collections, threading, json, random, scipy
import pandas as pd, pickle, sys, itertools, string, sys, re, datetime, time, shutil, copy
import seaborn as sns, matplotlib.pyplot as plt
import IPython,sklearn, warnings, pdb

Can somebody, help me with the error, I am getting with reset34?

I had problems with both the RuntimeError: CUDNN_STATUS_INTERNAL_ERROR error and an out of memory error showing up from time to time. What worked for me was to restart the kernel from jupyter notebook (from the top menu: “Kernel > Restart”).

How can I run the lesson 1 on a normal notebook, seems like I need to use paperpack machines to run this. resnet34 is giving error. Can somebody explain this.

thanks , i had same issue when setting up on AWS