AWS announced ability to pause, stop and restart AWS spot instances with preserving EBS volumes and even RAM state.
I hope I am wrong on this one, but I’ve seen this before from AWS and again it sounded too good to be true
You can specify whether Amazon EC2 should hibernate, stop, or terminate Spot Instances when they are interrupted.
I don’t think you can manually specify to hibernate a spot instance, though I hope I am wrong!
And do not misunderstand me, I am a very happy user of AWS and GPU instances definitely make deep learning much more accessible, but I don’t think AWS is very good in explaining to non-enterprise customers what they mean
I agree, thats not exactly the thing which comes first to mind about pause/resume. Spot Start/Stop was released on Sep 18 and yesterday was released hibernation feature.
But still I had a few times spot instances terminated in the middle of things. I have high hopes this would allow to continue training after restore.
Potentially there could be possibility to simulate this event - like having spot bid very close to current ones.
Actually I observed that at a given region almost every day prices on p2’s and p3’s comes up at certain times and goes back after about an hour.
And of course as an option NN could be used to predict what max spot price to set to have it run approximately for a desired duration
Unfortunately again I think we might be out of luck here
It is available for persistent Spot requests and Spot Fleets with “maintain” fleet option enabled.