Platform: AWS EC2 (DLAMI) ✅

thanks for the AMI, but could not find it in us-west-2, what region is it in?

Thanks for the AMI, but I could not find it in the Oregon region.

Also just wanted to know if this AMI is built out of the base image (without any frameworks installed) or is it on the image which has all the major frameworks pre-installed? Checking this because, the latter one takes up a huge chunk of your provisioned EBS and could be a waste if we are not using them.

Cheers !!

1 Like

@sujithvn @aymenim I just created a copy of the image in us-west-2 (Oregon) so you should be able to find it now.

The image is built on plain Ubuntu 18 with just conda, pytorch, fastai, and nvidia installed. So it should be pretty slim.

3 Likes

@wdhorton Thanks again… will try it today.

Thanks William, found your image in US-West-2 (Oregon) and back up and running!

I have noticed that it seems slower than the DLAMI. For reference, training the resnet34 model in Lesson 1, with all defaults took 4min48s, fine-tuning took 1min10s.

I just installed fastai using the following on a fresh AWS Ubuntu 16.04 p2.xlarge instance:
conda create -y python=3.6 --name fastai-py3.6
conda activate fastai-py3.6
conda install -y conda
conda install -y pip setuptools
conda install -y -c pytorch pytorch-nightly cuda92
conda install -y -c fastai torchvision-nightly
conda install -y -c fastai fastai
conda uninstall -y fastai
pip install -e .[dev]

I can see pytorch can access the GPU but as reported above jupyter kernel dies on create_cnn (eg on lesson1 pets), and I get a Illegal instruction (core dumped) also at create_cnn when running the notebook as a script.

Exact crash occurs in learner.py when call m.eval on the returned dummy_batch tensor which for cats and dogs is of shape torch.Size([1, 3, 64, 64]) as per below:

def dummy_eval(m:nn.Module, size:tuple=(64,64)):
“Pass a dummy_batch in evaluation mode in m with size.”
return m.eval()(dummy_batch(m, size))

Not possible for me to use a shared AMI, to get around this ill try pytorch1.0.0… Anyone have tips when working with pytorch 1.0.0 and fastai v1?

Adding myself up to the list of people suffering from this problem… Aren’t there people from AWS that can help us out here?

1 Like

Using Ubuntu 16 worked!. (One extra step install the nvidia drivers)
I can update the official documentation with the latest instructions. @jeremy should i create a pull request to do this?.

That’s great! In the meantime, could you post your install commands here?

# Update
sudo -i
apt-get update && apt-get --assume-yes upgrade

# Install Lib
sudo apt-get --assume-yes install build-essential gcc g++ make
binutils htop screen
software-properties-common unzip tree awscli cmake

# CUDA 10
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64
sudo dpkg -i cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get --assume-yes install cuda

# CUDA Check
nvidia-smi

# Anaconda
wget https://repo.anaconda.com/archive/Anaconda3-5.3.0-Linux-x86_64.sh
bash Anaconda3-5.3.0-Linux-x86_64.sh -b

# Environment Variables
export CONDA_HOME=/home/ubuntu/anaconda3
export PATH=$CONDA_HOME/bin:$PATH

conda update conda
conda upgrade --all --yes
conda install -c pytorch -c fastai fastai

4 Likes

Note: Use good old ubuntu 16 AMI

1 Like

Sure! Thanks for the offer :slight_smile: I’m not sure what happened with the AWS DLAMI - we had this same problem. Our solution was to use the plain Ubuntu 18 AMI. I’d suggest that over Ubuntu 16.

1 Like

Do we need to use the extra commands provided to install nvidia drivers if we use Ubuntu 18?

Yes I think all the steps are exactly the same.

@jeremy @astronomy88
Updated the documentation for Ubuntu 18 ami

Cheers

1 Like

Thanks for this… I had spent hours trying to work out what I did wrong until I found this thread.

After following your instructions I am up and running again.!

Thanks

Tony

Not sure if this is EC2 specific, but after following the usual “going back to work” instructions after a month, I get the following error when I try to run Jupyter notebook:

$ jupyter notebook
$ [C 22:48:54.046 NotebookApp] Bad config encountered during initialization:
$ [C 22:48:54.047 NotebookApp] The 'kernel_spec_manager_class' trait of <notebook.notebookapp.NotebookApp object at 0x7f0823192cf8> instance must be a type, but 'environment_kernels.EnvironmentKernelSpecManager' could not be imported

Note that this is for the fast.ai AWS image used for the class, and not from a new Ubuntu 16/18 EC2 instance.

I created an instance using 'Deep Learning AMI (Ubuntu) Version 21.0 ’ as I couldn’t find the option for ‘Deep Learning AMI (Ubuntu) Version 16.0’.

But I am getting the crash problem above 'The kernel appears to have died…
when running learn = create_cnn(…)

Can you please let me know if this is the correct AMI or if I should use another?

Many thanks

Jonathan

image

1 Like

This approach worked for me a few days ago, but I had to start over from scratch and now the kernel is dying as soon as I create a ConvLearner in notebook 1.

These instructions seem to be working: https://github.com/krishnakalyan3/course-v3/blob/aed64af19b34bcf0ddf1263bfd7d0e1744aac884/docs/start_aws.md

4 Likes