Setup problems: AWS

For those of you stuck trying to set up AWS and willing to try something else, I’m accepting beta signups for Crestle, which makes all of this easier in a few ways:

  • One-click GPU-backed Jupyter notebooks with all common DL libraries and tools preinstalled
  • K80 GPUs for $0.34/hour (billed by the second, first 25 hours free)
  • A cloud home directory to save all your work
  • A terminal to run jobs/scripts and manipulate files
  • fast.ai part 1 notebooks pre-loaded

You can sign up for an invite at https://www.crestle.com. Hope it’s helpful!

I’m having an issue getting setup on windows 10. I’m using the Ubuntu bash shell and have configured AWS but when I run setup_p2.sh I get the following error

setup_p2.sh: line 7: [: =: unary operator expected
setup_p2.sh: line 9: [: =: unary operator expected
setup_p2.sh: line 11: [: =: unary operator expected

Anyone know what’s going wrong?

Thanks

When I attempt to ssh into my the AWS instance created by running setup_p2.sh, I find that the aws-key-fast-ai.pem is password protected and I don’t know what the password is:

I’ve attempted to delete that aws-key-fast-ai.pem file, run the fast-ai-remove.sh script, and start over — but I get the same result.

Any suggestions? I’m on a mac …

Solved the problem.

My aws-key-fast-ai.pem file wasn’t formatted properly.

This resulted from an InvalidKeyPair.Duplicate error on this line of the setup script, which left the aws-key-fast-ai.pem empty of contents.

To solve this I manually deleted the keypair according to the instructions for starting over with AWS. I also opened a ticket on Github to automate this using the fast-ai-remove.sh script …

Thanks for the follow up.
I’ve actually figured out, with the help of a friend, how to create a new bash script that
can do the same thing as the aws-start command and am now using that.
Appreciate the help though :slight_smile:

Hello,

I’ve been trying to set this up but I’ve been stuck at the setup_p2 part for almost three hours now - I’ve tried what feels like everything and I’m posting here pretty much out of desperation. I’m on windows 10 & Cygwin.

I got both of these files from the github by running

wget https://raw.githubusercontent.com/fastai/courses/master/setup/setup_p2.sh
wget https://raw.githubusercontent.com/fastai/courses/master/setup/setup_instance.sh

my config file is as follows

[default]
output = text
region = us-west-2

I was able to run aws configure without incident

however trying to run these returns errors

a suggestion in this thread was to do

export ami="ami-bc508adc"
export instanceType="p2.xlarge"
bash setup_instance.sh

but doing this results in a bunch of malformed errors, which if I uncomment set -x look like this

$ bash setup_instance.sh
+ '[' -z ami-bc508adc ']'
+ '[' -z p2.xlarge ']'
+ export name=fast-ai
+ name=fast-ai
+ export cidr=0.0.0.0/0
+ cidr=0.0.0.0/0
+ hash aws
+ '[' 0 -ne 0 ']'
++ aws configure get aws_access_key_id
+ '[' -z $'*****************S3Q\r' ']'
++ aws ec2 create-vpc --cidr-block 10.0.0.0/28 --query Vpc.VpcId --output text
+ export $'vpcId=vpc-fdbdae9a\r'
+ vpcId=$'vpc-fdbdae9a\r'
+ aws ec2 create-tags --resources $'vpc-fdbdae9a\r' --tags --tags Key=Name,Value                                                      > =fast-ai
An error occurred (InvalidID) when calling the CreateTags operation: The ID 'vpc                                                      > ' is not valid
+ aws ec2 modify-vpc-attribute --vpc-id $'vpc-fdbdae9a\r' --enable-dns-support '                                                      > {"Value":true}'
+ aws ec2 modify-vpc-attribute --vpc-id $'vpc-fdbdae9a\r' --enable-dns-hostnames                                                       > '{"Value":true}'
++ aws ec2 create-internet-gateway --query InternetGateway.InternetGatewayId --o                                                      > `utput text`
+ export $'internetGatewayId=igw-7c63a91b\r'
+ internetGatewayId=$'igw-7c63a91b\r'
+ aws ec2 create-tags --resources $'igw-7c63a91b\r' --tags --tags Key=Name,Value                                                      > =fast-ai-gateway
An error occurred (InvalidID) when calling the CreateTags operation: The ID 'igw                                                      > ' is not valid
+ aws ec2 attach-internet-gateway --internet-gateway-id $'igw-7c63a91b\r' --vpc-                                                      > id $'vpc-fdbdae9a\r'
++ aws ec2 create-subnet --vpc-id $'vpc-fdbdae9a\r' --cidr-block 10.0.0.0/28 --q                                                      uery Subnet.SubnetId --output text
+ export $'subnetId=subnet-44e2cd23\r'
+ subnetId=$'subnet-44e2cd23\r'
+ aws ec2 create-tags --resources $'subnet-44e2cd23\r' --tags --tags Key=Name,Value=fast-ai-subnet
' is not validred (InvalidID) when calling the CreateTags operation: The ID 'subnet-44e2cd23
++ aws ec2 create-route-table --vpc-id $'vpc-fdbdae9a\r' --query RouteTable.RouteTableId --output text
+ export $'routeTableId=rtb-9cf165fa\r'
+ routeTableId=$'rtb-9cf165fa\r'
+ aws ec2 create-tags --resources $'rtb-9cf165fa\r' --tags --tags Key=Name,Value=fast-ai-route-table
' is not validred (InvalidID) when calling the CreateTags operation: The ID 'rtb-9cf165fa
++ aws ec2 associate-route-table --route-table-id $'rtb-9cf165fa\r' --subnet-id $'subnet-44e2cd23\r' --output text
+ export $'routeTableAssoc=rtbassoc-2457c75d\r'
+ routeTableAssoc=$'rtbassoc-2457c75d\r'
+ aws ec2 create-route --route-table-id $'rtb-9cf165fa\r' --destination-cidr-block 0.0.0.0/0 --gateway-id $'igw-> 7c63a91b\r'
"n error occurred (InvalidRouteTableId.Malformed) when calling the CreateRoute operation: Invalid id: "rtb-9cf165fa
++ aws ec2 create-security-group --group-name fast-ai-security-group --description 'SG for fast.ai machine' --vpc-id $'vpc-fdbdae9a\r' --query GroupId --output text
+ export $'securityGroupId=sg-9a82dce1\r'
+ securityGroupId=$'sg-9a82dce1\r'
+ aws ec2 authorize-security-group-ingress --group-id $'sg-9a82dce1\r' --protocol tcp --port 22 --cidr 0.0.0.0/0
"n error occurred (InvalidGroupId.Malformed) when calling the AuthorizeSecurityGroupIngress operation: Invalid id: "sg-9a82dce1
+ aws ec2 authorize-security-group-ingress --group-id $'sg-9a82dce1\r' --protocol tcp --port 8888-8898 --cidr 0.0.0.0/0
"n error occurred (InvalidGroupId.Malformed) when calling the AuthorizeSecurityGroupIngress operation: Invalid id: "sg-9a82dce1
+ '[' '!' -d /home/Eryk/.ssh ']'
+ '[' '!' -f /home/Eryk/.ssh/aws-key-fast-ai.pem ']'
+ aws ec2 create-key-pair --key-name aws-key-fast-ai --query KeyMaterial --output text
+ chmod 400 /home/Eryk/.ssh/aws-key-fast-ai.pem
++ aws ec2 run-instances --image-id ami-bc508adc --count 1 --instance-type p2.xlarge --key-name aws-key-fast-ai --security-group-ids $'sg-9a82dce1\r' --subnet-id $'subnet-44e2cd23\r' --associate-public-ip-address --block-device-mapping '[ { "DeviceName": "/dev/sda1", "Ebs": { "VolumeSize": 128, "VolumeType": "gp2" } } ]' --query 'Instances[0].InstanceId' --output text
An error occurred (InvalidGroupId.Malformed) when calling the RunInstances operation: Invalid id: "sg-9a82dce1
"
+ export instanceId=
+ instanceId=
+ aws ec2 create-tags --resources --tags --tags Key=Name,Value=fast-ai-gpu-machine
An error occurred (MissingParameter) when calling the CreateTags operation: The request must contain the parameter resourceIdSet
++ aws ec2 allocate-address --domain vpc --query AllocationId --output text
+ export $'allocAddr=eipalloc-aeade794\r'
+ allocAddr=$'eipalloc-aeade794\r'
+ echo Waiting for instance start...
Waiting for instance start...
+ aws ec2 wait instance-running --instance-ids

I’m pretty confused since I’ve completely started over with these instructions five or six times and I’m pretty sure I’ve been doing everything exactly as the instructions say, but it looks like there’s some sort of syntax issue that’s throwing all of these errors.

I really just want to proceed with the course and I’m starting to run low on ideas on how to tackle this, so I would really appreciate the help :slight_smile:

Hey !
I have a new MacBook Pro and every time I try to connect to my instance using ssh and the given link, my instance times out within seconds. I tried to disable the Qos setting on my MacBook by adding this to my ssh profile which was otherwise empty.

Host A
IPQoS 0

but I still cannot connect to the instance. its been more than a week now. If someone could help me out or tell me how they got this to work on their Mac I would be very grateful. Thanks!

I want to ask, do I really need to a AWS account? Assume I can setup the Jupyter Notebook and other packages on my own Linux box, will it be enough? The only potential reason is GPU is much fatser? About how much faster? Can we use a course example to quantify it?

I am worry that with such approach, I will be locked up to AWS forever. I don’t like the idea to pay for certainly GPU time to run a program.

Sorry that this is not a setup problem. I don’t know where can I ask such question.

On my reasonably fast Core i7 Ubuntu machine, running the main training process from lesson 1 on the CPU took about 2 hours. On the AWS p2.xlarge instance using the GPU, it takes about 10 minutes. So the difference for me was about 12X faster with AWS.

If you choose to use your local machine instead, you will find that it severely limits the kind of experiments you can run with the data. That will really undermine your efforts to understand what is going on.

If the cost is an issue, there is information on the wiki about using spot instances from AWS instead of on-demand instances. Using that, you can get the hourly cost down from $0.90/hour to the range of $0.13-$0.28 depending on which region you are using.

My recommendation: Go with AWS for now. You could always build a GPU machine for yourself later if you want.

3 Likes

Im on Windows 10 & Cygwin. I’m having to exact problem and spent a whole day trying to figure this out. ambisinster -did you get any progress on this?

mhogl@DESKTOP-J5CO4AN ~
$ aws configure get region
us-east-1

mhogl@DESKTOP-J5CO4AN ~
$ bash setup_p2.sh
Only us-west-2 (Oregon), eu-west-1 (Ireland), and us-east-1 (Virginia) are currently supported

no luck so far @mhoglund

I agree that experimentation is critical, but I’d disagree that using a local machine harms experimentation. In fact, I think it’s the opposite. I spent less than $350 to build a DL machine; it outperforms AWS on the Lesson 1 benchmark (~400s vs ~600s.) I really like knowing that the marginal cost of trying something out is zero (setting aside electricity). GPUs aren’t really that expensive, so I’d definitely recommend at least examining the local machine option first.

To be clear, I mean that using a local machine without a GPU will severely limit your ability to experiment. When it takes 2 hours to train a single epoch, you’re just not going to be willing to try many alternatives.

If you have a local machine with a GPU, then you don’t have that problem.

Try using Ubuntu Bash for Windows instead.

Thank you. I solved my issue by restoring to factory settings and installing Ubuntu Bash for windows.

I keep getting this error when trying to link the source code. I have a new mac. How can I fix it?

Julians-MacBook-Air:~ julianneuman$ wget http://www.platform.ai/files/aws-alias.sh
-bash: wget: command not found
Julians-MacBook-Air:~ julianneuman$

Thank you Max!

You need to install wget. The easiest way to do that on a Mac is to install brew using instructions here: https://brew.sh/. Brew is also pretty much the standard if you want to install other Unixy packages (like awscli).

1 Like

I am getting this error (the initial setup_p2.sh that I ran had the same problem, this is a different script) I think it is installing some html code instead of the pure script, im not sure why this is how can i fix it?
i have a mac by the way

Julians-MacBook-Air:~ julianneuman$ bash setup_p2.sh.1
setup_p2.sh.1: line 7: syntax error near unexpected token newline' setup_p2.sh.1: line 7:

Hi @julianneuman,
Can you try this in your shell?