Quick Google Colab setup for Part 2 week 1 along with pascal VOC dataset

The following script handles fast.ai setup along with dataset required for Part 2 week 1. Open your notebook, turn on the GPU and just run the script.

!pip install https://github.com/fastai/fastai/archive/master.zip
!pip install opencv-python
!apt update && apt install -y libsm6 libxext6
!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 
!pip3 install torchvision
!mkdir data
!wget http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar -P data/
!wget https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip -P data/
!tar -xf data/VOCtrainval_06-Nov-2007.tar -C data/
!unzip data/PASCAL_VOC.zip -d data/
!rm -rf data/PASCAL_VOC.zip data/VOCtrainval_06-Nov-2007.tar

Thank you @binga for suggesting the edit.

18 Likes

Thanks! It’s very useful!

1 Like

The Google Colab thing is kinda amazing!

Am I missing something or is it free? What are the restrictions/limitations and how does it compare with using AWS, Paperspace, Crestle, et. al.???

it is free. they only let you use GPU for 12 hours straight. And if connection is ever lost, you have to start all over again.

Also I’ve heard reports that GPU memory is often shared and it’s only luck which decides how much memory you’ll have :slight_smile:

Great resource, will give it a try. Hopefully, the first lesson shouldnt cause resource issues!

Still … in terms of getting up and running quickly, this kinda blew my mind.

I was fully prepared to encounter and troubleshoot a bunch of issues, but instead I can just start coding. It may not be the ideal setup for ML solutions long term, but imho it should be the de facto environment for folks starting with part 1 of the course.

On the contrary, my experience with Colab has been quite good. Most often had 11G memory to myself.

Also, @wgpubs Colab is backed by a K80 GPU similar to a p2.xlarge instance. So, the speed is similar. However, for faster training times, you’d like a p3.xlarge (which has 16G RAM).

EDIT: Refer this link for additional details: https://stackoverflow.com/questions/48750199/google-colaboratory-misleading-information-about-its-gpu-only-5-ram-available

2 Likes

Yeah, a good place to get familiar with the notebook features before spinning up the hourly paperspace /aws.

Note that the version of fastai in pip is pretty old at the moment.

1 Like

A nifty little improvement to the script to address the older version of the library on pip is to make this change:
! pip install https://github.com/fastai/fastai/archive/master.zip instead of ! pip install fastai

3 Likes

Thank you for your suggestion! I have edited the script.

Thank you @sourabhd for this resource!
Really useful to first work through the notebook once, before powering up Paperspace.

I think it would be a good first step to follow after each week’s lesson! :smile:

I’m glad it helped. After using Google colab you will never look back at Paperspace! :rofl:

1 Like

The Paperspace GPUs are much faster - although the price isn’t so good!..

Definitely @jeremy ! Here in India most of the students cannot afford to pay. Even personalized GPUs are very costly. I have done my entire Part 1 of fast.ai on Google colab and it didn’t disappoint me.
I really enjoyed the first live session. Thank you so much for making a wonderful course!

2 Likes

Hey @sourabhd. I was initially trying to implement whatever I can in Colab as well but somewhere around lesson 4 or 5, it just started taking too long and memory issues were very annoying. What I wanted to know was, how long did training models (say the language model in lesson 4) take for you? Did it continue running for that long without interruption?

Hello @sourabhd It is my first time to use Google colab
data file is not found when i run the last three lines

    tar: data/VOCtrainval_06-Nov-2007.tar: Cannot open: No such file or directory
    tar: Error is not recoverable: exiting now
    unzip:  cannot find or open data/PASCAL_VOC.zip, data/PASCAL_VOC.zip.zip or data/PASCAL_VOC.zip.ZIP.

Where have i gone wrong?

![colab|690x387](upload://vj8lTqWKToU8zlXUimlRWxBpSQI.jpg)
1 Like

Hi Nandutu,

I just figured out that the wget command is downloading the tar files in the “root” directory, instead of the “data” directory.

Simply change the the last 2 lines to the following:

!tar -xf VOCtrainval_06-Nov-2007.tar -C data/
!unzip PASCAL_VOC.zip -d data/

Also once you have extracted everything, you will need to move the PASCAL_VOC files to the data/ directory to run the notebook as expected.

!mv data/PASCAL_VOC/* data/

No need to change anything else other than this.
It will work as expected then :smile:

1 Like

I had used -d symbol instead of -P for the path. I have edited the code and it should work now!

1 Like