NOTICE
The original post was migrated to the wiki.
You might also refer to this medium article.
Thank you! @slavivanov
I was getting errors with missing .aws.creds (my.conf is generated automatically from 1st approach)
Do I need to set it up myself in the way you described elsewhere?
2) Create .aws.creds with your actual IAM credentials with EC2 privileges in this format:
AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX
AWSSecretKey=XXXXXXXXXXXXXXXXXXXXXXXXXX
Hi @xinxin.li.seattle,
Sorry about that!
The ondemand_to_spot script creates the .aws.creds file using the same approach as setup_p2.sh (using aws configure get aws_access_key_id
and aws configure get aws_secret_access_key
).
If these were not set when you ran on_demand_to_spot.sh (e.g. you havenāt run aws configure
), you can create .aws.creds
in ec2-spotter
using this template:
AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX
AWSSecretKey=XXXXXXXXXXXXXXXXXXXXXXXXXX
PS: Also I missed some crucial steps in the existing instance approach, which I just updated.
Thank you this will be quite helpful. I am not a lot familiar with AWS. So this question might seem stupid. It is more sought of clarification. Is the EBS volume running even when the spot instance stops? And if the volume is running, it will cost to keep the EBS volume running right? And how much on average it cost?
Hey @Saiyan!
Yes, you pay for the EBS volume regardless of whether it is attached to an instance. Currently itās $0.1/GB-month. This means that if you have a 100GB volume for a full month, it will cost you $10, which IMO is not that much.
Thanks for the clarification and speedy reply
@slavivanov Thank you for fixing the script. It is working for me very well!!
One advice is to put a word of caution in approach 1 fresh instance, because in step 3 it terminates not just the instance from Step 1, but all of your existing instances created with fastai setup script. Luckily, I always backup my data and code in the cloud, so nothing is lost. Because p2.xlarge was approved with a limit, for those with a small limit, you want to be very cautious about accidentally terminating your only approved instance. Other than that, this script works exceptionally well and is very easy to follow. I highly recommend it. Great job and thank you for sharing it @slavivanov!
Thanks @xinxin.li.seattle!
The script will use (and terminate) an instance named āfast-ai-gpu-machineā, which might not be the instance that was just launched. Iāll add a note about this.
Thanks a lot for this!
Iām getting this error when trying to run bash start_spot.sh
:
parse error: Invalid numeric literal at line 1, column 8
parse error: Invalid numeric literal at line 1, column 8
It seems to be related to jq
. The spot instance seems to otherwise load fine.
Iām running Ubuntu 16.04.1 LTS.
Hi, @z0k
I probably forgot to specify the output type. Iāve pushed a commit to github for this.
Let me know if it works for you.
Thanks a lot! Iāll let you know the next time I spin up a spot instance.
Hey, Iāve tried setting it all up but I get the following error:
ondemand_to_spot.sh: 7: export: i-0278bf10da31b66a9: bad variable name
I suspect some small change in the bash script would do, but Iām still not sure what that should be. Could you please look into that?
Thanks a lot!
EDIT 1:
So using a temporary fix (substitution instance id in the script) worked but then that was the output:
TERMINATINGINSTANCES i-0016ed57539ce3077
CURRENTSTATE 32 shutting-down
PREVIOUSSTATE 16 running
Waiting for volume to become available.
ondemand_to_spot.sh: 91: ondemand_to_spot.sh: cannot create ec2-spotter/.aws.creds: Directory nonexistent
All done, you can start your spot instance with: sh start_spot.sh
Then, when I tried to do sh start_spot.sh, it stated the following:
start_spot.sh: 5: start_spot.sh: Bad substitution
ā¦/ec2spotter-launch: line 38: .aws.creds: No such file or directory
Spot request ID:
Waiting for spot request to be fulfilledā¦
Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded
Waiting for spot instance to start upā¦
Waiter InstanceRunning failed: Waiter encountered a terminal failure state
Spot instance ID:
Please allow the root volume swap script a few minutes to finish.
Then connect to your instance: ssh -i /home/slazien/.ssh/aws-key-fast-ai.pem ubuntu@
Iām not sure what that could be and Iām not sure which variable name from the first issue could be wrongā¦
EDIT 2:
So I managed to fix my first issue (getting instance ID), but Iām still stuck at āondemand_to_spot.sh: 91: ondemand_to_spot.sh: cannot create ec2-spotter/.aws.creds: Directory nonexistentā, even though I created the directory manuallyā¦
I think the script assumes that youāre running in the fast_ai
directory, so try changing this line
export aws_credentials_file=ec2-spotter/.aws.creds
to the following
export aws_credentials_file=../.aws.creds
Instead of running the script again though, I think it should work if you just manually create the .aws.creds
file in the ec2-spotter
directory as follows:
export aws_key=`aws configure get aws_access_key_id`
export aws_secret=`aws configure get aws_secret_access_key`
cat > .aws.creds <<EOL
AWSAccessKeyId=$aws_key
AWSSecretKey=$aws_secret
EOL
Hi @slazien, sorry about this!
@z0k is exactly right. The ondemand_to_spot file was previously in a different folder. Follow his instructions to get this solved.
(Iāve also pushed a fix for this to github).
Hey @z0k and @slavivanov!
Thank you so much for your responses, changing that line (why didnāt I notice that myself?) fixed it all. There is still an error when running start_spot.sh (start_spot.sh: 5: start_spot.sh: Bad substitution), but it seems to work fine.
EDIT: so after terminating the on-demand instance and converting it to spot with the script it turns out nvidia-smi is not working, which is strange:
modprobe: ERROR: ā¦/libkmod/libkmod.c:514 lookup_builtin_file() could not open builtin file '/lib/modules/4.4.0-64-generic/modules.builtin.binā
modprobe: ERROR: ā¦/libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-64-generic/modules.dep.binā
modprobe: ERROR: ā¦/libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-64-generic/modules.dep.binā
modprobe: ERROR: ā¦/libkmod/libkmod-module.c:832 kmod_module_insert_module() could not find module by name='nvidia_367ā
modprobe: ERROR: could not insert ānvidia_367ā: Unknown symbol in module, or unknown parameter (see dmesg)
NVIDIA-SMI has failed because it couldnāt communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Also, while trying to apt-get update it says dpkg was interrupted, ughā¦
E: dpkg was interrupted, you must manually run āsudo dpkg --configure -aā to correct the problem.
Did any of you have a similar problem?
EDIT 2: After fixing dpkg nvidia-smi seems to work fine.
Iām glad you managed to get it working. I havenāt encountered this error.
are there part 2 scripts for this?
This is awesome work, well done - It will save me millions over the next few year.
Iāve spent several hours installing everything and configured it now so the instances launch and worked out how to mount the instance.
One questions is that I donāt have jupyter notebook installed, so when I do installed - it routes to localhost.
Also the nvidia-smi doesnāt seem to work, so Iām wondering if I need to install a bunch of scripts?
Any thoughts?