How to setup virtual machine on Azure for running course lessons?

skbisoi · March 27, 2018, 5:38am

In Short answer YES…

Long answer below steps i followed…

After connecting to SSH to ubuntu VM(DL VM is hosted in azure cloud and using MS DL VM ) through GIT Bash terminal …
I followed below steps…

Now, move into a directory where you are comfortable installing the Fastai repo, with its libraries and required packages.I did with default directory so default command i am using …
Now you got to clone that repo as follows:

git clone https://github.com/fastai/fastai
Once the cloning process finishes, be sure to be in the directory created by git for the Fastai repository, and type:
conda env create -f fastai/environment.yml

why absolute path of environment.yml(ie. fastai/environment.yml) required because of the below error. error-https://github.com/conda/conda/issues/3847

then activate conda environment like using below command

source activate fastai
why prefix ‘source’ is required in above command due to below error

now after activation of fastai conda environment type below command to open the jupyter notebook

(fastai)> jupyter notebook

Chris_Palmer · March 27, 2018, 7:47am

Thanks for that information!

Chris_Palmer · March 27, 2018, 9:03pm

Having tried all of the steps outlived here and in other posts, I have created the fastai source and environment on my new NC6 instance.

To create the environment:
~/fastai$ conda env create -f ~/fastai/environment.yml

To activate it:
source activate fastai

But it doesn’t really work for me.

The first issue is that when I use the URL supplied when I invoke jupyter notebook I end up in my local system - not on the Azure box.

The second thing is I tried import torch from within ipython on the NC6 system, and I got an error (below) - so there must be some missing dependencies - but I am not sure how to tackle that!

Do you have any suggestions for either or both errors?

Python 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-eb42ca6e4af3> in <module>()
----> 1 import torch

~/.conda/envs/fastai/lib/python3.6/site-packages/torch/__init__.py in <module>()
     54 except ImportError:
     55     pass
---> 56 from torch._C import *
     57
     58 __all__ += [name for name in dir(_C)

ImportError: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so: undefined symbol: mkl_lapack_ao_ssyrdb

skbisoi · March 28, 2018, 5:46pm

Related to this issue
“The first issue is that when I use the URL supplied when I invoke jupyter notebook I end up in my local system - not on the Azure box”

You are connecting to "linux vm hosted in azure cloud and to connect the VM machine either u can connect through remotely or through a Bash Shell window like cygwin/Git Bash from windows local machine.

Here are u inside Azure DL VM machine or you are connecting through the linux Bash command tool window?

If your are cnonecting to the VM for Linux Bash command window like CygWin/Git Bsh Shell window then you will open the link in your local browser but u will use the GPU of the Azure DL cloud VM.

For second pytorch is not installed corretcly perhaps…
SO install the pytorch

Chris_Palmer · March 28, 2018, 6:04pm

Thanks for replying Susant

I’ve connected through the Windows Linux subsystem Bash window.

I’ve tried the recommended approach ssh -L 8888:127.0.0.1:8888 myserverpaddress

I also tried without the -L 8888:127.0.0.1:8888

Once there, I start jupyter notebook --no-browser, and copy and paste the URL I am given into my browser (on my local PC). I just end up looking at the currently running notebooks on local PC, which also are on port 8888. I want to see the notebooks that are on the server.

For the pytorch, I thought that setting up the fastai environment should install that correctly. It starts, but is seems to need something it hasn’t got…

Perhaps something is broken in the latest Pytorch?

Or perhaps it’s because CUDA isn’t in sync?

When I nvcc -V I get this:
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

But when I look at the pytorch package I have this:
pytorch: 0.3.1-py36_cuda9.0.176_cudnn7.0.5_2 pytorch [cuda90]

Can I safely upgrade the CUDA, or is that fixed on the VM? Otherwise, I can step back to an earlier version of Pytorch, but I know it will handle some things differently, which is possibly an issue with the fastai programs which expect CUDA 90.

Regarding the notebooks - here are the ports that are open - perhaps I need to add 8888?

skbisoi · March 28, 2018, 6:47pm

Yup u have to add 8888 For jupyter notebook

Chris_Palmer · March 28, 2018, 7:28pm

I’ve done that, but I am still getting the same behaviour

manikanta_s · March 29, 2018, 9:19am

Can you try closing your local server running on 8888 and then connect to the instance? I guess your request from the browser is first being served by local jupyter server.

Regarding pytorch, I have tried 10 days back and it’s working fine. Let me check once.

Chris_Palmer · March 29, 2018, 11:11am

WIthout the local server running I get a clear error message:

This site can’t be reached
localhost refused to connect.
Search Google for localhost 8888
ERR_CONNECTION_REFUSED

Following something I found on the internet this is the result of using netstat:

netstat -an | grep "LISTEN "
tcp        0      0 127.0.0.1:8081          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:3476          0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:8001          0.0.0.0:*               LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
tcp6       0      0 :::8000                 :::*                    LISTEN
tcp6       0      0 127.0.0.1:8005          :::*                    LISTEN
tcp6       0      0 :::8009                 :::*                    LISTEN
tcp6       0      0 :::8080                 :::*                    LISTEN

skbisoi · March 29, 2018, 5:00pm

I got connection timed out issue when i use Office VPN like fortis client .So check any virus scanner or VPN is blocking the same.After that i shutdown the VPN and connect to my local broadband(LAN connection) and worked fine.

Related to pytorch issue with CUDA compatibility issue may be let me check any forum has answers for that issue.

skbisoi · March 29, 2018, 5:05pm

Is ur path correct ?why this ~ ??? in ur below command… after f …as we should give absolute path like 'conda env create -f /fastai/environment.yml"

But ur showing like below…

~/fastai$ conda env create -f ~/fastai/environment.yml

skbisoi · March 29, 2018, 5:10pm

May because of environment activation issue ur getting pytorch import error…try to resolve the environment issue first…

Check this link for possible import issue

Chris_Palmer · March 29, 2018, 6:43pm

I was in the fastai directory when I issued conda env create -f ~/fastai/environment.yml, so to use a path I had to give it relative to home. I can get up into home and redo it, but I wouldn’t have thought it should make much difference - in fact I suspect just doing it without the path information should work if I am in the directory that contains the yml file?

I don’t believe I have an environment issue, source activate fastai works just fine, I will try importing other libraries and see how things go…

Chris_Palmer · March 29, 2018, 6:47pm

Thanks - I looked at that.

I was surprised that you hadn’t mentioned a possible CUDA mismatch, but this post mentions CUDA mismatch as a major issue.

Did you do anything in your setup to upgrade CUDA drivers to CUDA9 for your VM?
Or to explicitly install CUDA8 version of Pytorch?
Have you recently (last few days) performed a conda env update?
If you import torch and print(torch.__version__) what does it say?

Chris_Palmer · March 29, 2018, 7:05pm

Hi Susant.

I didn’t understand this reply…

Chris_Palmer · March 29, 2018, 7:55pm

OK. I started IPython and executed the lines of imports.py one by one. They were all OK except for the the import of seaborn (see below). But that indicates to me that the environment is OK, its just pytorch that isn’t.

import seaborn as sns

QXcbConnection: Could not connect to display
Aborted (core dumped)

Found a reference to this which says that in an ipython environmnent you may have to use os.environment, and this worked for me:

import os
os.environ['QT_QPA_PLATFORM']='offscreen'
import seaborn as sns

No error!

I will see if I can find advice on upgrading the VM to have CUDA9, failing that I will downgrade to a CUDA8 version of Pytorch.

Chris_Palmer · March 29, 2018, 10:56pm

I have upgraded to CUDA 9, but unfortunately I still get the problem:

Chris_Palmer@FASTAI:~$ cat /usr/local/cuda/version.txt
CUDA Version 9.0.176
Chris_Palmer@FASTAI:~$ source activate fastai
(fastai) Chris_Palmer@FASTAI:~$ ipython
Python 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-eb42ca6e4af3> in <module>()
----> 1 import torch

~/.conda/envs/fastai/lib/python3.6/site-packages/torch/__init__.py in <module>()
     54 except ImportError:
     55     pass
---> 56 from torch._C import *
     57
     58 __all__ += [name for name in dir(_C)

ImportError: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so: undefined symbol: mkl_lapack_ao_ssyrdb

skbisoi · March 30, 2018, 4:05am

I just left it to you to check the issue from that link.

skbisoi · March 30, 2018, 4:07am

Finally we are to square one.I will crosscheck again when i go to my home.I am in office now.

nkakhani · October 13, 2021, 7:47am

Dear @vrajjshah, did you partition your drive? how could you fix your problem?