AWS AMI available for testing

sabzo · November 10, 2017, 3:43am

Hey… Did you get that resolved? I’m not sure how to go about that.

jeremy · November 10, 2017, 3:45am

It’s only necessary on crestle, where you won’t get that error.

sabzo · November 10, 2017, 3:50am

ah, thank you.

Robi · November 10, 2017, 9:47am

Thank you! I had previously checked a couple of Tmux ninja-cheat sheets without finding any hint about this shift-mouse combination

Chris_Palmer · November 11, 2017, 11:14pm

Hi @ramesh, thought I would try this, but stuck on first base. Can you send instructions on adding port 8888? Is this screen snapshot here what I should be doing?

ramesh · November 12, 2017, 4:03am

Hi Chris - This is right. Once you update your security group with the 8888 port set to open as you have above, the instance will be able to receive and send on port 8888. Then run the previous commands I listed - AWS AMI available for testing and it should allow you to access the Jupyter notebook via the browser.

http://<ec2 address>:8888

Make sure you run jupyter notebook inside a tmux session. That way the notebook keeps running even if your ssh session gets disconnected.

Chris_Palmer · November 12, 2017, 12:32pm

All working as you described - thank-you very much for such clear and thorough instructions!

tweber · November 12, 2017, 9:23pm

Is there an AMI that would work on the p3 instances? Or, what kind of tweaks might I try to get this AMI working? This is great for working through notebooks but sometimes I crave the speed I see on paperspace instances when trying stuff out.

jeremy · November 12, 2017, 9:35pm

No AMI, but if you simply install anaconda, git clone the repo, and conda env update you’ll be ready to go!

Chris_Palmer · November 13, 2017, 3:05am

Hi Ramesh

I am still not quite getting the results I want.

First of all I do not have a static IP adress do I need one?
Secondly, I still need to tunnel, including -L 8888:localhost:8888, otherwise I cannot connect to Jupyter notebook from my browser
When I connect to http://localhost:8888 I now have to enter a password, so something has "worked"
But if my connection drops, as it did today due to a momentary loss of Internet connection, then even if I reconnect to my running AWS server I have to reload everything in the notebook and start again.
Which is what I am trying to fix.

Is there any way we can get the learning process to proceed without needing to be permanaently connected to the notebook via my flaky internet connection? Should we do it all in the terminal window and bypass the Jupyter Notebook - it really is being a pain and the weak link

ramesh · November 13, 2017, 3:39am

If your Internet is flaky, I would suggest don’t use Tunneling to run Jupyter Notebook. Access Jupyter Notebook directly on the browser via AWS AMI available for testing. That does not use localhost:8888 method, but via http://<ec2 address>:8888.

You don’t need static IP for this, but I would recommend that you run Jupyter Notebook inside a tmux session in your AWS. That way if your ssh -i <pem key> ubuntu@<ec2 address> gets disconnected, your tmux session is still running and that’s keeping your jupyter process also running.

Chris_Palmer · November 13, 2017, 8:12pm

Thanks for replying again Ramesh!

I think I got confused about the directions and had a mixed up environment. Having started again without using the -L 88888:localhost:8888 in my tunnelling thigs work as you have described.

I haven’t had a chance (luckily!) to test it, but in this scenario, even if I lose my connection to my EC2 instance, that is, even my terminal window as well as my browser are no longer communicating with the server, will the server keep running and I can reconnect once my internet is working again? If so, will I be able to navigate to the running notebook, open it, and I should still see it chugging away through whatever process I had set running?

ramesh · November 13, 2017, 8:17pm

Glad to be of help. You can pick up from where you left off by opening the same notebook, but you still need to Save the Notebook periodically. One caveat - If a particular cell was running and you lost your internet connection, you may not see the results. You can pick up from the previous cell that executed successfully.

Notebook is not intended to be run and I will check later. Its more for interactive sessions. If you have long running cells, it should ideally be run as .py file on command line.

Chris_Palmer · November 15, 2017, 7:04am

Is it possible to run the training on the command line and then persist it so you can later load it into a notebook? Does the learn.save do that, or is that more for creating a check point in a currently running model?

jeremy · November 15, 2017, 2:02pm

learn.save does indeed do that. Although I don’t understand what you mean by " is that more for creating a check point in a currently running model"…

Chris_Palmer · November 16, 2017, 3:49am

Hi @Jeremy

I wasn’t sure if learn.save was fully persisting the complete model (weights and all) on the hard disk, so it could be simply loaded in a brand new session / notebook, or if it merely recorded information that is accessible and makes sense only within the currently running session.

To make this clear, if I created something I like and used learn.save, then powered down my AWS instance and came back the next day, can I use learn.load to reload the fully configured model into a new notebook on my newly running instance, without any preparatory work apart from loading libraries (and possibly data?), in my brand new session?

jeremy · November 16, 2017, 4:02am

Yup. I mean - you first need to construct the model (e.g. using pretrained()), but then you load the weights into it.

Chris_Palmer · November 16, 2017, 4:06am

Thanks Jeremy

Is this something you could demonstrate, or have demonstrated or blogged about? The reason why this is important is because I have lost heaps of time and work when I drop my internet connection, so I was wanting to see if there is a way to recover more quickly back to my last saved position.

jeremy · November 16, 2017, 4:46am

Once you’ve created your learn object in the usual way, just type learn.load('filename'), and that’s it!

Chris_Palmer · November 16, 2017, 4:48am

Just like you have been doing all along?

OK, I will try it on one I saved a few days ago - thanks!