I wasn’t sure if learn.save was fully persisting the complete model (weights and all) on the hard disk, so it could be simply loaded in a brand new session / notebook, or if it merely recorded information that is accessible and makes sense only within the currently running session.
To make this clear, if I created something I like and used learn.save, then powered down my AWS instance and came back the next day, can I use learn.load to reload the fully configured model into a new notebook on my newly running instance, without any preparatory work apart from loading libraries (and possibly data?), in my brand new session?
Is this something you could demonstrate, or have demonstrated or blogged about? The reason why this is important is because I have lost heaps of time and work when I drop my internet connection, so I was wanting to see if there is a way to recover more quickly back to my last saved position.
I finally got around to trying this, and got an error. Are you able to help me undertand what I did wrong? I tried finding something that looked like this saved name on my hard disk but couldn’t locate it. Does it get affected by any other process such as updating code through git or conda env upgrade?
Yes if the model definition has changed it won’t be able to load it. Also, if you save with precompute=False, you must load with the same value (and visa versa).
As I mentioned earlier in the thread, there’s nothing to see. Just grab the AWS Deep Learning AMI, and then simply install anaconda, git clone the repo, and conda env update you’ll be ready to go!
At work we do not have access to community AMIs, so I have to recreate the box from vanilla ubuntu. So just wanted to make sure I install the correct and compatible versions of Cuda and CuDNN.
After that I can always install Anaconda, git clone the repo and conda env update to the get the libraries.
@arunabh Spin up a instance with plain Ubuntu 16.04 and Try Running the paperspace bash script here . If you get any error in the inital few lines comment/remove them. It will work fine i suppose.
Still does not work. torch not found is an error that pops up again and again. Also, a lot of python libraries are not installed when running the jupyter notebook
can you post what error you have got ? Did you run the script by SSH into the Instance using putty like softwares: Web GUI slogs. if you still face the problem.
Try the following:
If you have got the Cuda drivers and CUDNN and anaconda Setup. Else run the commands from the paperspace script line by line.
Do a git clone on Fastai repo. Delete Fastai Environment if present using : conda env remove -n fastai
Create a new environment using the environment.yml present in the fastai repo.
Hope it helps.
~Gokkul