Performance Issues in Azure (800+ seconds for lesson 1)

I am trying to use Azure NC6 machine and the speed is pretty slow (~800+ seconds for lesson 1 notebook training).
I’ve moved to NC12 (two GPU) and still very slow, I guess because it uses only one GPU anyhow (when I do nvidia-smi it seems like only one GPU it working (Volatile Gpu-Util)).
Both NC6 & NC12 use K80 GPU. Here are the details on these machines

I am using python 3.6 and Keras 2 as in here1.
Using tensorflow as backend with keras.json with:

“floatx”: “float32”,
“epsilon”: 1e-07,
“backend”: “tensorflow”,
“image_data_format”: “channels_last”
I also tried to use theano backend but I still get aroun 800 seconds elapsed time.

I see people getting ~200 seconds fitting while with these “strong” servers I get about x4 slower performance.

Any suggestion? Maybe @dradientgescent who reported 229 seconds.

Are you using the GPU version of tensorflow and not the CPU version - Ie pip install tensorflow-gpu?

I don’t use AWS or Azure, but I know AWS will be around 600-650s due to the slower GPUs.
K80 will run around 600-650s, I am using a 1070 locally with proper setup and why I am able to get 229s.

I use Azure NC6 VM and getting similar results: 691s with CNMeM enabled, and 900-1000s with CNMeM disabled.

I’m also using NC6 in Azure. I’ve been spending days trying to get the first lesson to even run and I stumble from one error to the next since I’m using Keras 2 and trying to use the latest versions of everything. I still have not been able to complete one notebook due to the myriad of errors and rabbit holes to try and chase everything down. Am I missing something? I feel like there should be a cleaner approach. Any help would be appreciated.