Currently I’m running jupyter notebooks on AWS (because I have credits left over there).
The problem is that if the model is large and takes a lot of time to train - as soon as the SSH connection between my machine and EC2 breaks > browser effectively terminates > notebook breaks, and I can’t see results of training (and have to start all over).
So I’m looking for a solution that would let me train models for long periods of time on AWS, even when my local machine is disconnected.
Some thoughts I had:
I tried using VNC to connect into the machine and run a notebook locally, but apparently deep learning AMIs ship “headless” and so there is no desktop environement…
I thought about taking the code from the notebook and preparing a training file… but I have no idea how to do that? And also that doesn’t seem very effective from experimentation viewpoint.
I tried looking for a way to run notebooks without the browser, but there doesn’t seem to be one?
This is my first time working with notebooks / ML, so please treat me like I’m 5. Any help super appreciated!
@mrfabulous1 one thing you can do is log how many epochs you got to on fit one cycle and then save that model, and continue training after x point (not sure if this was suggested!)
Hey @m_ke - thanks for the reply. I’m already doing --no-browser, but that only means that JN doesn’t try to open a browser when you launch it. Doesn’t change the fact that if you’re training something from a browser later and close the browser - it breaks.
Could you explain to a 5 year old what does http access to a remote server allow you to do that I can’t do through ssh? Like what are you winning here? I don’t come from an eng background, learning on the fly.
As long as you do not use specific colab libraries (e.g., to access gdrive), from menu file:download .py will give you a python script.
Also, shell commands from the notebook do not run directly, however you can always run them from the shell without !.
just a fancy interface, execution step by step during development/debug, easy documentation of what you do, things like these.