Platform: GCP ✅

Is there a smarter way to make a dataset available inside Google Compute Engine than starting the instance and wget/curl the dataset? I wanted to try the WikiArt dataset, but just downloading it would take hours apparently and all while the GPU is sitting idle.

Finally got everything setup in GCP and ran the lesson-1 notebook :slight_smile:

2 Likes

Unless you’re launching a cluster of instances that need to talk to each other very fast, you don’t really care about zones. You might get some extra-lag sometimes, but it probably won’t be noticeable.

@sgugger I’m stuck on 3rd step after creating the instance. My command :
gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME – -L 8080:localhost:8080
runs successfully but http://localhost:8080/tree is not working. I’m using Mac os
BTW I’m a newbie to cloud computing and I’m not getting at all the setup process.
In the 2nd step i tried to install google cli on pc but it didn’t worked. Then i tried to directly run the gcloud init command on cloud’s cli and it worked. But I was unable to connect to my instance terminal. I know basic command line arguments so I thought I can work with GCP. What do u suggest shall I switch to paperspace or salamander?

Hey ! I suggest separate SSD for that. I believe you are doing proteins classification challenge :wink:. I am in process of making a bash script to make this all easier. Will surely share with you.

2 Likes

I have basically the same question, but with the additional aspect of possibly being able to download the data to such a separate disk without having to let the main machine run idle the entire time.

1 Like

Downloading speed is great in cloud servers. Typically around 40 Mbps. So even 30 gigs would take around 5-10 minutes. Still If you are downloading very large size, spin up a shared 1vcpu instance and attach the disk ,download the data, stop the instance, start a gpu instance with that attached disk . Viola you have the data

4 Likes

Maybe the guide needs to be edited? After the updated image, git clone is no longer necessary. Direct fw to local port…

Just to clarify, this will allow two instances (one with a GPU and one without) to share storage, right?

You can use this command to attach a disk from another instance, meaning you can use a CPU-only machine to download data and later attach the disk to the GPU machine. I am trying to do that atm.

6 Likes

Be sure to leave your terminal open when the command runs: if you close the connection, you won’t be able to access the notebook online.

Noob question, but how do I access the disk once it is attached? Like where is it in the file system?
Answer: https://devopscube.com/mount-extra-disks-on-google-cloud/

Currently the way they’ve set up the image doesn’t support “git pull”. I’m going to try to get that fixed. In the meantime, you have to clone yourself to get the most recent updates.

1 Like

Thanks for the great guide! Everything up and running over here…

This may be a paranoid question, but is it safe to use Google’s command-line utilities (gcloud) for accessing GCP? I wasn’t able to find any information about uninstallation on their site, and don’t know how much “reporting back home” is being done silently.

If any one faced issue in running the fastai like,

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-0dcb1b68c103> in <module>
----> 1 path = untar_data(URLs.PETS); path

NameError: name 'URLs' is not defined

Then we have to update the fast ai library.
conda update fastai -c fastai will probably help change version to 1.0.6.
This will also give some permission trouble.

Issue is, conda & related files are installed with owner & group as root. Your doesn’t have permission to update/create dir and it fails.

Following command will help to reach fastai 1.0.11.

sudo su
 cd /opt/anaconda3/bin/
 ls -l
 ./conda install anaconda -y
 ./conda update fastai -c fastai 
 ./conda update prompt_toolkit
1 Like

Hey! one can not attach a disk to 2 running instances as far as i know! You need to detach the disk from instance first to successfully attach it to other one.
command is “gcloud compute instances detach-disk INSTANCE_NAME --disk DISK_NAME”
Also i didnt remember this command i just did gcloud compute instances detach-disk --help and manual was opened. You can use this little trick next time. :slight_smile: Also if you dont remember the start of the command just write gcloud followed by 2 tabs and it will show you further available commands.

1 Like

how to download my saved csv file from GCP instance to my pc?

3 Likes

use this scp cmd, gcloud compute scp --recurse example-instance:~/narnia ~/wardrobe

see examples at https://cloud.google.com/sdk/gcloud/reference/compute/scp

from IPython.display import Filelink
Filelink(’/path/to/file’)

:wink:

3 Likes