Platform: GCP ✅

The same happens with me, what drives me crazy is that Google Colab, a FREE service, still offers free GPU computing, altough for long trainings it’s not an optimal solution.
If it can help, on LInux, you can use the following command in order to know which zone has GPU accelerators avialable (at least in theory):

gcloud compute accelerator-types list | grep europe | grep p100

this will find all europe zones with nvidia p100 gpus.

I was just able to run an instance in zone europe-west4-b using a v100 and a t4 gpu.

1 Like

Thanks Django. It saves time for testing GPU by each region.

@hellohei
I wonder if you might need to verify your gpu quota. Make sure , you have a value greater than 1 in the limit field.

Hi @npatta01
I found that the committed GPU quota in all regions are 0.
Do they expire after a period of time?

Hello there, I have a problem on my GPC virtual machine, I can’t save any kind of learned model, even the one trained in lesson1 (pets) notebook.
The export command by the way works just fine.
Can anybody reproduce my issue, or knows what to try in order to fix this?
The error that i get is the following:

TypeError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
327 with _open_file_like(f, ‘wb’) as opened_file:
–> 328 _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
329
/opt/conda/lib/python3.7/site-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
395
–> 396 pickle_module.dump(MAGIC_NUMBER, f, protocol=pickle_protocol)
397 pickle_module.dump(PROTOCOL_VERSION, f, protocol=pickle_protocol)
TypeError: file must have a ‘write’ attribute
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
in
----> 1 learn.save()
/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py in save(self, file, return_path, with_opt)
252 if not with_opt: state = get_model(self.model).state_dict()
253 else: state = {‘model’: get_model(self.model).state_dict(), ‘opt’:self.opt.state_dict()}
–> 254 torch.save(state, target)
255 if return_path: return target
256
/opt/conda/lib/python3.7/site-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
326
327 with _open_file_like(f, ‘wb’) as opened_file:
–> 328 _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
329
330
/opt/conda/lib/python3.7/site-packages/torch/serialization.py in exit(self, *args)
205 class _open_buffer_writer(_opener):
206 def exit(self, *args):
–> 207 self.file_like.flush()
208
209
AttributeError: ‘NoneType’ object has no attribute ‘flush’

output of show_install(1) is:

=== Software === 
python        : 3.7.6
fastai        : 1.0.61
fastprogress  : 0.2.2
torch         : 1.4.0
nvidia driver : 418.87
torch cuda    : 10.1 / is available
torch cudnn   : 7603 / is enabled

=== Hardware === 
nvidia gpus   : 1
torch devices : 1
  - gpu0      : 15079MB | Tesla T4

=== Environment === 
platform      : Linux-4.9.0-12-amd64-x86_64-with-debian-9.12
distro        : #1 SMP Debian 4.9.210-1 (2020-01-20)
conda env     : base
python        : /opt/conda/bin/python
sys.path      : /home/jupyter/tutorials/fastai/course-v3/nbs/dl1
/opt/conda/lib/python37.zip
/opt/conda/lib/python3.7
/opt/conda/lib/python3.7/lib-dynload

/opt/conda/lib/python3.7/site-packages
/opt/conda/lib/python3.7/site-packages/IPython/extensions
/home/jupyter/.ipython

Mon May 18 07:32:03 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P0    28W /  70W |   9785MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1342      C   /opt/conda/bin/python                       9773MiB |
+-----------------------------------------------------------------------------+

Hi Guys,

I am new to fast.ai and trying to set up gcp on windows 10 machine. I have downloaded ubuntu command prompt and ubuntu
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal

I am getting the attached error when trying to run the : ```
sudo apt-get update && sudo apt-get install google-cloud-sd
E: The repository ‘http://packages.cloud.google.com/apt cloud-sdk-focal Release’ does not have a Release file.

Hey @vk3220 I had the same issues setting up yesterday, the issue is really only there if you use the latest Ubuntu release. You can overcome this specific error by adding [trusted = yes] right after deb in the sources.list but I kept on running into different errors further down the track. I uninstalled 20.04 and installed 18.04 instead, from there it was smooth sailing - just following the steps I was able to set everything up in just a few minutes…

1 Like

you need to subsitute $INSTANCE_NAME for whatever you named your instance and same for $ZONE unless you use default.

1 Like

Thank you @ascholtes . I almost left the hope of running it onto GCP. It worked as you said after installing the previous version.

Did you use the same credit card in both accounts?

I’m still just trying to set it up. Using the parameters provided (I changed the zone to us-east1-b)
ERROR: (gcloud.compute.instances.create) Could not fetch resource: - The user does not have access to service account ‘service-############@compute-system.iam.gserviceaccount.com’. User: ‘emailaddress@gmail.com’. Ask a project owner to grant you the iam.serviceAccountUser role on the service account
I already went through the login process using gcloud through ubuntu, I’m not sure why it’s asking this.

They ultimately said that my phone number was used too many times.

Is anyone else getting preempted constantly? I really like GCP and don’t want to leave but I get booted every 10-45 min these days. It used to be fairly rare. Are there some regions that are slower but would get me preempted less frequently?

Yeh - that was happening to me too. So I decided to just bite the bullet and use non-preemptible instances on my free credit which will obviously burn it up faster.

But even now I find sometimes it’s difficult to start a VM due to regions/zones not having enough resources. I had one running in us-west1-a last week, but could not restart it this morning. But was able to start a new one in us-west1-b. But it doesn’t have any of my work.

Trying to see if I can create a snapshot of the disk from the instance in us-west1-a to create a new disk for us-west1-b.

So I would recommend if you create your own notebooks on a GCP instance, then make sure you save them locally or something too.

I guess we are not alone: GCP preemptible instances seem unusable

1 Like

I have had the same problem, sometimes I can’t even start up for hours. It’s really disruptive, and it doesn’t seem like theres an easy way to contact them to talk to someone.

Shyam, I am facing the same problem. I have tried increasing the quota to 1, 2 and even 32 for the
‘GPUs (all region)’. I also write in the description that this is for fast ai course.

I receive an acknowledgement email for my service request but then immediately I receive another rejection email. Seems to me like an automated rejection email.

Please can anybody guide me through this. What am I doing wrong over here?

I also couldn’t figure out. I even talked to a customer service representative but she told me that I can’t access a GPU without a months usage history in my GCP account. So currently I am using paperspace, its much less hassle especially for fastai.

Not sure if anyone’s running into the same issue - but I’ve been having trouble with getting my models to train with a gpu - or even having the GPU on ( torch.cuda.is_available() returning false).

Eventually I tried to just do some things in Pytorch to see if things would work at that level and it did not. The cuda driver version was 10010 and pytorch version came in at 1.5.0.

Stumbled upon this a thread in pytorch that seems to solve it by downgrading the pytorch version. Not sure if anyone else is running into the same issue, but thought I would share since it was quite frustrating for me.

When it first happened I ended up just starting creating a brand new instance on GCP the other day because I couldn’t figure out what else to do (it only recently started acting up) - everything was fine in the new instance until today – I wonder if FastAI2 has some library dependencies or something in it’s environment that makes it incompatible with the cuda drivers on gcp(???)

Same here…went back to paperspace. I hope someone from fast ai discusses this issue with google.

-Happy learning

1 Like

Hey all,
I’ve also got the aforementioned problem: my request to increase GPUs (for all regions) quota from 0 to 1 was immediately rejected.
If anyone has any suggestions, I’d be glad to go through them, since now, even having passed a whole day setting up my workstation for the course, I can’t actually do anything.