Is it possible to split epochs? I make use of callbacks to save each finished epoch but that’s not enough. My Google cloud VM keeps on being preempted and each epoch lasts 3 hours 30 minutes because I have a large dataset. I’ve retried the same epoch for 15 times in a row, total waste of money as it gets cut from 5 to 30 minutes after starting. Is there a way to save the sate of the un-finished epoch just like we’re used to do with finished epochs ?
Does anyone have trick ?
Here is my 3:30 epoch started over. Guess it won’t last until finished.