Switching between free(t2.micro) and paid(p2.xlarge) on AWS

arjunrajkumar · October 31, 2017, 1:37pm

Hey!

I’ve setup both t2.micro and p2.xlarge instances on my AWS.

I git cloned the fast.ai lesson while I was on the (free) t2.micro instance. The directories and all the lessons appear on my AWS instance (ubuntu@ip-ABC:~$.) terminal. But when I stop the t2.micro instance and switch to the p2.xlarge instance, the local bash termianl (ubuntu@ip:-XYZ~$) IP changes and none of the directories which I cloned in the previous t2.micro is available on the p2.xlarge.

Does this mean I have to to git clone the fast.ai repo again? I wanted to just experiment and play on the free instance and only switch to the p2.xlarge while needing GPUs. Thought that whatever directories/notebooks I created on the t2.micro will automatically be there when I shift to the p2.xlarge. But that is not happening. Am I doing something wrong? Or is this how it works where each instance is like a new directory by itself. New to this - so just testing AWS. Help please. Thanks!

schumi · October 31, 2017, 1:53pm

Yes. The local disk on the free-tier will be wipe out when you terminate the t2 instance.

You can use a S3 bucket for the git repo or a EBS (elastic block store) volume in order to have your data with any AWS instances.

Steeve

Vishucyrus · October 31, 2017, 1:54pm

Hi @arjunrajkumar … As per my knowledge both the instances are separate from each other. It’s like two different computers. So if do something on one computer the other one is totally unaware about it. That’s why when you git cloned the repo on t2.micro instance , it didn’t get transferred on p3.xlarge as they both are different from each other in terms of memory and locations. There is no inner connection between them. Hope it explains ur situation.

arjunrajkumar · October 31, 2017, 1:56pm

Hi Steeve… It isn’t getting wiped out - as when I log back into the t2 instance, the directories are still there.

arjunrajkumar · October 31, 2017, 1:56pm

Thanks! Makes sense.

Vishucyrus · October 31, 2017, 1:57pm

Yeah the directories will be there of course. Apply the two different computers analogy.

schumi · October 31, 2017, 1:58pm

It will be wiped out if you terminate the instance.

beecoder · October 31, 2017, 1:59pm

Hey @arjunrajkumar, I think you’ll need to reclone the dir structure in each instance, unless you create some kind of persistent storage on AWS.

BTW, were you able to access the pytorch libs etc in t2.micro? I think I wasn’t able to, I’ll be installing p2.xlarge anyway.

arjunrajkumar · October 31, 2017, 2:00pm

Got it… Been stopping the instance and not terminating it, so its still there.
I’ll check the EBS volume you mentioned to see if I can share data between instances. That’ll make things much easier. Thanks!

bevanc · October 31, 2017, 2:28pm

If I understand what you are trying to do - Only use P2 instance when training larger models and use T2 for coding and basic testing of samples, instead of setting up two instances, I have used one instance, but I change the instance type.

To do this you need to stop the instance, and then using the AWS console --> EC2 instance screen to change the instance type from T2 to P2 and visa versa as required. I have found that this has worked quite well.

beecoder · October 31, 2017, 2:35pm

This sounds like a good idea. When you change an instance the Volume attached is still the same? I.e you don’t need to do a git clone every time you change?

schumi · October 31, 2017, 2:39pm

I don’t think so, in AWS you have two types of storage. The local HDD will be wiped out if you terminate the instance (it’s mainly storage for the OS).

The best is to look for a EBS, it’s a storage volume that you can detach from one instance and attach to one other.

If you put your data in this kind of volume, you are okay.

Or use a S3 bucket and give access to your instance to this bucket (via a role in the IAM).

Steeve

arjunrajkumar · October 31, 2017, 2:44pm

Thanks @bevanc … THis is just what I was looking for.

arjunrajkumar · October 31, 2017, 2:46pm

@beecoder You don’t need to git clone the second time when you do it this way. The directories / notebooks etc are all the same between T2 and P2 if you do it the way Bevan mentioned above.

mosessoh · October 31, 2017, 3:13pm

hey arjun, i actually worked on a few scripts to automate these steps over the weekend. it’s kind of like rolling your own crestle — i wanted to do that so i could use the $500 AWS credits we got. sharing these here in case anyone else wants to use these.

basically, the end goal is to be able to

start my servers with aws-start
stop my servers with aws-end
ssh into my servers with aws-ssh
check whether my servers are running with aws-check

Both my servers are connected to the same EBS volume, so I work on a jupyter notebook until I’m pretty sure everything kinda works, then I start the GPU and run the experiments.

the only setup you need is:

I started a p2 and m4 via the console and then got their instance-id numbers.
I also got the volume-id from the console of the EBS drive that comes attached to either of them. I unmounted the other.

for aws-check --> save the following to check_aws.sh and alias it in your bash_rc / bash_profile. do similar things for the commands below.

#!/bin/bash
aws ec2 describe-instances --query 'Reservations[*].Instances[*].[Placement.AvailabilityZone, State.Name, InstanceId, InstanceType]' --output text

for aws-start -->

#!/bin/bash
aws-check

read -p "Start? (cpu/gpu): " AWS_INSTANCE_TYPE

if [ "$AWS_INSTANCE_TYPE" == "cpu" ]
then
  export AWS_INSTANCE_ID=$AWS_CPU_ID
fi
if [ "$AWS_INSTANCE_TYPE" == "gpu" ]
then
  export AWS_INSTANCE_ID=$AWS_GPU_ID
fi

aws ec2 wait volume-available --volume-ids $EBS_VOL_ID
aws ec2 attach-volume --device /dev/sda1 --volume-id $EBS_VOL_ID --instance-id $AWS_INSTANCE_ID
aws ec2 start-instances --instance-ids $AWS_INSTANCE_ID
aws ec2 wait instance-running --instance-ids $AWS_INSTANCE_ID

aws-ssh

for aws-ssh -->

#!/bin/bash
export AWS_PUBLIC_DNS=$(aws ec2 describe-instances --instance-ids $AWS_INSTANCE_ID --filters "Name=instance-id,Values='$AWS_INSTANCE_ID'" --query 'Reservations[*].Instances[*].PublicDnsName' --output text)
ssh -L localhost:8888:localhost:8888 -i <key-file.pem> ubuntu@$AWS_PUBLIC_DNS -o "StrictHostKeyChecking no"

for aws-end -->

#!/bin/bash
aws-check

read -p "Stop? (cpu/gpu): " AWS_INSTANCE_TYPE
if [ "$AWS_INSTANCE_TYPE" == "cpu" ]
then
  export AWS_INSTANCE_ID=$AWS_CPU_ID
fi
if [ "$AWS_INSTANCE_TYPE" == "gpu" ]
then
  export AWS_INSTANCE_ID=$AWS_GPU_ID
fi
aws ec2 stop-instances --instance-ids $AWS_INSTANCE_ID
aws ec2 wait instance-stopped --instance-id $AWS_INSTANCE_ID
aws ec2 detach-volume --volume-id $EBS_VOL_ID
aws ec2 wait volume-available --volume-id $EBS_VOL_ID
echo $AWS_INSTANCE_TYPE stopped.

Remember to also export your GPU instance ID, CPU instance ID and EBS volume ID in your bash_profile file.

It’s not the best it could be. These are still all on-demand instances. Crestle is great because it manages provision spot instances for you. But it got the headache of remembering to unmount my EBS when I was switching between my GPU and CPU instances out of the way, so it’s a start.

Hope these help!