Splitting epoch during fit_one_cycle


Is it possible to split epochs? I make use of callbacks to save each finished epoch but that’s not enough. My Google cloud VM keeps on being preempted and each epoch lasts 3 hours 30 minutes because I have a large dataset. I’ve retried the same epoch for 15 times in a row, total waste of money as it gets cut from 5 to 30 minutes after starting. Is there a way to save the sate of the un-finished epoch just like we’re used to do with finished epochs ?

Does anyone have trick ?

Here is my 3:30 epoch started over. Guess it won’t last until finished.


Similar topic discussed here: Epochs of arbitrary length

1 Like

Thank you so much for the link.
I’ll have a look right away.