Pytorch checkpoints for using spot instances

dwcar49us · January 18, 2019, 11:22pm

Has anyone looked at using Pytorch checkpoints during training (like save model.state_dict() ) to allow using interruptable spot instances like the new Gradient° Low-Cost instances on Paperspace or AWS spot instances directly?

https://pytorch.org/tutorials/beginner/saving_loading_models.html

It seems like saving state and model parameters regularly during training could be a more economical way to train larger models. Would it be possible to have direct support for this feature in fast.ai library?