How to `.fit()` then close the Mac for awhile

When I do .fit(), sometimes it needs to run for a very long time (half an hour). But if I close my Mac on that time, the process would break down with WebSocket timeout error since the browser is closed (even though I had tmux attached). Is there any way to leave it running while being able to close my Mac and drive away?

You have tmux running on the remote server (not local) before you start the process, right? (Or screen or byobu are alternatives.)

It should automatically detach if your SSH session ends, but you can also try manually detaching and exiting. When you return and reattach, your process should still be running.

Yes, tmux is running on the AWS. The Jupyter notebook itself did not die, but the .fit() progress failed on the kernel. (I ran it for so many epochs that it would take half an hour to finish). The console said that there’s a WebSocket timeout. I guess the web browser was suspended when I closed the Mac screen. I wonder how people handle a long running .fit() on the go.

Oh, I see what’s happening now. These are two separate problems.

  1. the web socket timeout error on your terminal isn’t an issue, everything is still running on the remote server. Just log back in to your remote server.

  2. Jupyter notebook is not the right place to run something for a long time if you need to put your computer to sleep. You have two reasonable options:
    a) Save your script to the server as a .py file and run it as a python script.
    b) Write your script to save your model periodically and automatically restart from the last save when it is stopped. You’ll probably want to use the ModelCheckpoint callback from Keras for that.


from keras.models import load_model
from keras.callbacks import ModelCheckpoint
import os

OP = 'weights.h5'

def build_model(**x):
    pass

def get_generator(**x):
    pass

if os.path.exists(OP):
    model = load_model(OP)
else:
    model = build_model()

mygen = get_generator(**x)

model.compile(**x)
model.fit_generator(mygen, callbacks = [ModelCheckpoint(OP)], **x)

1 Like

Ah okay. I didn’t know that people mix the console & the notebook Python experience. Thanks! More things to discover on Keras too.

David is right about the websocket error, it has to do with your laptop’s connection to EC2, not the jupyter process running on the instance.

An alternative way to to run background processes is with nohup. No need to use tmux.

nohup your-command-here &

Also you can run jupyter notebooks from the command line using nbconvert and save your errors/outputs.

jupyter nbconvert
–to notebook --execute redux.ipynb
–output output_redux.ipynb
–ExecutePreprocessor.timeout=-1
–Application.log_level=“DEBUG”

http://nbconvert.readthedocs.io/en/latest/

2 Likes