How to run long computations with fastai and GCP

bluteaur · November 2, 2019, 2:19am

Hi,

Edit: The solution to my errors below was to ensure ipython and torch were at their latest version. Using python 3.7.5 was needed for some other errors. Another really weird source of problems was using the file name “code.py”. For example, I had a temporary testing file called temp.py with two line (import fastai; from fastai.text import *; ) and would immediately run code.py after completion despite no reference to it at all.

Now I’m just hoping for a solution to run long computations that would take days.

tldr; If anyone could suggest how to run a long training process for a fastai project, that would be great. Specifically, I would like to be able to run the code from this Tutorial without having a constant connection from my computer to the SSH connection.

I’m having a load of problems getting GCP to work with fastai. At the moment, the only thing I can get to work is the Jupyter notebooks which is great for testing. Unfortunately this requires constant ssh connection to the VM instance, otherwise progress is lost. I have a training process for BERT NLP that would take 2-3 days from the estimates and can’t have my computer under constant connection.

I want to run a long training job with GCP (or elsewhere, but preferably GCP). I have the code copied and pasted from this example on how to use BERT to classify toxic comments: Tutorial.

Things that have gone wrong (solved in Edit):

None of the fastai imports work, for example the biggest issue is getting the error below. I simply try to run the python code from the above tutorial using “python code.py”. I tried many suggestions with no luck.

Traceback (most recent call last):
File “code.py”, line 3, in
from fastai.text import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/text/init.py”, line 1, in
from … import basics
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/basics.py”, line 1, in
from .basic_train import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/basic_train.py”, line 2, in
from .torch_core import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/torch_core.py”, line 2, in
from .imports.torch import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/imports/init.py”, line 1, in
from .core import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/imports/core.py”, line 17, in
from pdb import set_trace
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/pdb.py”, line 76, in
import code
File “/home/bluteaur/code.py”, line 4, in
from fastai.metrics import error_rate
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/metrics.py”, line 3, in
from .callback import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/callback.py”, line 2, in
from .basic_data import *
File “/home/bluteaur/.conda/envs/fastai/lib/python3.6/site-packages/fastai/basic_data.py”, line 5, in
DatasetType = Enum(‘DatasetType’, ‘Train Valid Test Single Fix’)
NameError: name ‘Enum’ is not defined

My instance has these settings:
export IMAGE_FAMILY=“pytorch-latest-gpu”
export ZONE=“us-west1-b”
export INSTANCE_NAME=“my-fastai-instance”
export INSTANCE_TYPE=“n1-highmem-8”

           gcloud compute instances create $INSTANCE_NAME \
                   --zone=$ZONE \
                   --image-family=$IMAGE_FAMILY \
                   --image-project=deeplearning-platform-release \
                   --maintenance-policy=TERMINATE \
                   --accelerator="type=nvidia-tesla-p100,count=1" \
                   --machine-type=$INSTANCE_TYPE \
                   --boot-disk-size=200GB \
                   --metadata="install-nvidia-driver=True"

pip list:

Package Version

beautifulsoup4 4.8.1
blis 0.4.1
Bottleneck 1.2.1
certifi 2019.9.11
chardet 3.0.4
cycler 0.10.0
cymem 2.0.2
dataclasses 0.7
fastai 1.0.59
fastprogress 0.1.21
idna 2.8
importlib-metadata 0.23
kiwisolver 1.1.0
matplotlib 3.1.1
more-itertools 7.2.0
murmurhash 1.0.2
numexpr 2.7.0
numpy 1.17.3
nvidia-ml-py3 7.352.0
packaging 19.2
pandas 0.25.3
Pillow 6.2.1
pip 19.3.1
plac 1.1.3
preshed 3.0.2
pyparsing 2.4.2
python-dateutil 2.8.0
pytz 2019.3
PyYAML 5.1.2
requests 2.22.0
scipy 1.3.1
setuptools 41.6.0.post20191030
six 1.12.0
soupsieve 1.9.4
spacy 2.2.2
srsly 0.2.0
thinc 7.3.1
torch 1.3.0
torchvision 0.4.1
tqdm 4.37.0
urllib3 1.25.6
wasabi 0.3.0
wheel 0.33.6
zipp 0.6.0

Any help would be great, thanks