Py3 and tensorflow setup

Thanks, I’ll consider it. Especially the part on speed might be interesting. I’ve had some speed issues occasionally with part I.

+1 for Anaconda. Not only is it a separate environment, you can actually create different setups within it that have specific libraries imported. I wasn’t sure about switching to it either, but after doing so for this course I’m never going back.

1 Like

It works! :wink:

Because I’m using spot instances which I set up and tear down every time, I updated my setup script to reflect the above changes:
https://github.com/jonas-pettersson/fast-ai/blob/master/scripts/install-gpu-tf.sh

Here is the description how to set up an AWS spot instance using the script:

4 Likes

For spot instances, would it be easier to have Anaconda etc installed onto an EBS volume, and then simply attach the volume to the new instance after you create it? (Which you could do in your script). For instance, you could attach the EBS volume as your home directory, which means you’ll also have your configuration changes saved automatically.

3 Likes

thanks, I will try that. I believe I still pay something for a volume per GiB even when it is not attached to an instance. but it is not so much. I will try and let you know.
if I can save the time for file transfer it might be worth it.

UPDATE: hmm… cannot detach a volume from an instance while it is running, and it is a feature of the spot instance that it cannot be stopped, just terminated. With termination the volume is gone, of course, so it seems I’m stuck with the procedure of setting it up again. Which is fine, at least for me.

UPDATE 2: I was wrong in my belief that volume from spot instance cannot be saved - there is an option to that (“DeleteOnTermination”: false,). Inspired by @slavivanov I am now working on a similar approach. However I think it is not necessary to generate JSON from text, instead I can use a JSON conf file like this:
aws ec2 request-spot-fleet --spot-fleet-request-config file://config.json
I will work on this and update when I have it ready.

This looks like a great approach: Persistent AWS Spot Instances (How to)

3 Likes

Hey everybody!

I’ve been working on migrating some of the models (and pre-trained weights) from Part 1 of this course to TensorFlow. This is mostly so I can export TensorFlow graphs that will run on mobile devices (TensorFlow has decent support for running trained models on mobile devices).

I’ve put together a Python Notebook that walks through the conversion process and some of the gotchas:

It might be useful to some of you. Especially, if you already have a trained model that you want to use with TensorFlow.

Cheers!
James

Update (Feb 27th): Sadly, something is not right in my script. Everything works when using Theano dim ordering, but the conversion to TensorFlow dim ordering is broken. Somewhere… I updated the notebook to make things clearer. Could use some help, if anyone else is interested.

6 Likes

Thanks again, I will have a closer look and report any results

UPDATE: it works very well, you can create a new spot instance, mount an existing volume and use it in the way @jeremy proposed. I will finalize, test and document the following scripts, but I post it so anyone interested can already have a look to see how I did it.


setup_aws_spot_w_remount.sh: sets up a new spot instance and executes the remount
specification.json: configuration file for creating the spot instance
remount_root.sh: remount script executed on the newly created spot instance
remove_aws_spot.sh: cancel spot request, terminate instance, and remove the (empty) default volume of the spot instance

UPDATE 2: Above scripts are now tested, corrected, and documented. For sure not everything is checked and, because of the swap root volume operation, should be used with some care. But they would anyhow be useful for someone using the approach of spot instances and mounting an existing volume to root. Further explanations directly in the scripts.

After upgrading, will I be able to run the scripts from part-1?

@sakiran It depends. The Vgg model from Part 1 doesn’t work without some modifications, and the any saved weights for convolutional layers needs some significant adjustments to work.

See my post/notebook above for more details.

There wouldn’t be any issue if I have python-2 and python-3 simultaneously. Correct?

@sakiran Python3 will be your default environment after upgrading, you won’t be using both simultaneously (it uses whichever version is in your path, in this case Python3).

You shouldn’t have any issues, other than a few particularities to Python3 syntax that should be simple to fix when you encounter them.

@jpuderer are you sure it’s working correctly? The training of the model results in a validation accuracy of 0.5000 aka random chance for a 2 class system. It doesn’t seem like the model is being optimized properly unless I’m missing something.

@even Pretty sure. It’s just that I included the sample set (which is too small to train with, but the full set is too large for Github). I used the full data set, and it trains to the same level as Theanos.

I should probably update the notebook with the output from training on the full set, since if you don’t see the small comment, it’s easy to be misled.

I’ll post the results in a bit. It will be a little while, I’m using a CPU at the moment.

Edit (Feb 26th): Something is not quite right. It trains, but not as well as the original Theano network. I must have made a mistake somewhere. I’ll post an update when I know what it is.

Edit (Feb 27th): Something is not right in my script. Everything works when using Theano dim ordering, but the conversion to TensorFlow dim ordering is broken. Somewhere… Could use some help.

Thank you @jpuderer for the good job !
I have just read what changes to do in the github.
But, please could you confirm that with your changes in the github link above, we could run both part 1 and part 2 scripts ?

We’ll be using a different approach to using vgg and resnet in part 2, so you’ll probably want to make some changes to your scripts based on them. (The model weights in part 1 don’t work in tensorflow, and theano weight files you’ve saved won’t work with tensorflow either.)

Other than that, just about everything should work fine. You can always switch backends by changing your keras config file, of course.

2 Likes

@Kjeanclaude You would need to maintain two separate sets of weights for running under Theano or TensorFlow, or be constantly converting between the two regularly.

It probably isn’t worth the trouble. I would just pick one (ie. TensorFlow for this course), convert your weights once, and stick with that.

1 Like

OK, good to heard that.
Indeed I was writing something to allow those who want to run the part 1 scripts locally to do it via a docker image. But with your work, I think we would have more advantage to make the necessary conversions and then run part 1 and part 2 scripts on the same installation (through a VM and virtual gpu for local use).
Great job !

Since I didn’t take Part One with you and am still catching up on the MOOC and associated homeworks, it would be ideal to keep the Theano / Python 2 environment separate from the Tensorflow / Python 3 until I develop sufficient facility to port over the older code. What would be the easiest way to create a separate AMI for the new setup so I can continue chugging along on the old stuff?