Fast.ai with Google Colab


(Vishal Pani) #1

Due to personal reasons I am not able create an account through Paperspace or other cloud platforms. I heard that Google is providing free GPU through the Colab platform. Will that be sufficient to complete both parts of the deep learning course?
It provides the tesla K80 gpu and 13gb of RAM.


(Gil Rosenthal) #2

I’ve successfully completed part 1 of the Deep Learning course using Google Colab. It took a little fiddling around, but I think I have found the 1-run script to make it work. Here are the steps I take:

  1. Upload the desired fast.ai notebook to my Google Drive, then open it in Colab

  2. Change runtime type to Python 3, and then change Hardware Accelerator to GPU

  3. When the runtime is connected, add this block of code to the top: https://gist.github.com/gilrosenthal/58e9b4f9d562d000d07d7cf0e5dbd840

  4. Run it, then the rest of the notebook should run fine!

NB: You need to run this script any time you connect to a new runtime or open the notebook


(Ronaldo da Silva Alves Batista) #3

Hello @vishal.pani,

I’m currently in the second half of Part 1 of the Deep Learning Course. It’s perfectly possible to complete the whole course just using Colab.

In fact, a great resource that gives you almost a similar experience with using AWS, Paperspace or Google Cloud is Clouderizer using the Colab as a backend platform for the GPU and Google Drive as your permanent disk, so it’s a free computation engine because of Colab, a free permanent disk for saving datasets and models because you can connect your Google Drive account to it.

The team at clouderizer already created a Community Project for fast.ai which does all the configuration for you and load the updated libraries each time you start the project, loads any Kaggle Datasets at start or install python libraries or linux libraries.

You can sign up and literally in minutes begin running the notebooks with a terminal available in the browser.

In fact it’s even better than the paid Cloud Products mentioned because we can access the project, terminal and notebooks from anywhere just using the browser without any install and no configuration, no sweat and not a one dollar spent.

I’m not afiliated at all to Clouderizer by the way, I highly recommend because it’s awesome for study. I even have some credits in Google Cloud but I prefer to use clouderizer because I can access from any crappy PC during the day when I have some free time.

Check detailed instructions in these links:

Cheers,

Happy Learning


(Vishal Pani) #4

Thank you for the helpful responses @grosenthal and @ronaldokun!
Now I can dive into the course without the fear of crashing midway!


(Sunil Tapashetti) #5

I am facing an issue with setting up fast.ai on cloderizer . Next button not visible on SETUP page. Any workarounds?


(Vishal Pani) #6

I was able to set it up without any problems. Can you send a screenshot?


(Sunil Tapashetti) #7

It was a simple case of scrolled up page. Prompt response from Prakash of clouderrizer solved the issue,


(Ronaldo da Silva Alves Batista) #8

Prakash is very helpful and always available to solve any issue.


(Sebastian Gonzalez Aseretto) #9

How did anyone solved the gpu memory issue that google colab has?


(Vishal Pani) #10

@sgaseretto Did you try to run the code? I went through the first notebook without any problem.


#11

The memory issues come down to luck (or location). Basically some % of users are allocated the 5% and hence can’t use colab. The rest of users get the full 12gb of GPU RAM (like me :slight_smile:


#12

How did you save your work when you edited the fast ai notebooks, or if you created a new notebook from scratch? I find if I save my notebooks, stop the project, then come back later and start it all up again my work has all disappeared.


(Ronaldo da Silva Alves Batista) #13

Hello Kyap,

Did you connect your Google Drive account to your Project at Clouderizer?

Everything in the folders:

  • fast.ai ( code )
  • out

Are saved in the Drive inside a folder named clouderizer

The clouderizer platform sync these folders every few minutes. Wait for at least 5 minutes before saving the notebook to close your clouderizer project, or just check your Google Drive to see if the file was updated.


(Cory Kendrick) #14

I’ve been working through all the lessons in Colab, and haven’t set up Clouderizer.

Here’s my notebook explaining how to use Colab for Fastai (including importing data from various places, which is where I’ve had the most trouble so far!). I’ll keep adding cells for each lesson as I work through it.

The basics:

Always add a cell to the top of the notebook like this:

!pip3 install fastai

How to import data from fastai URLs (lessons 1, 3, 4):

For lesson 1:

# Get the Dogs & Cats data, unzip it, and put it in the 'data' directory:
!wget http://files.fast.ai/data/dogscats.zip && unzip dogscats.zip -d data/

# Check to make sure the folders all unzipped properly:
!ls data/dogscats

For lesson 3:

# Get the Rossmann data and make a directory to put it in:
!wget http://files.fast.ai/part2/lesson14/rossmann.tgz && mkdir -p ~/data/rossmann

# Unzip the .tgz file, and put it in the right directory:
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xzf rossmann.tgz -C ~/data/rossmann/

# Make sure the data is where we think it is:
!ls ~/data/rossmann

For lesson 4:

# Get the IMDB data:
!wget http://files.fast.ai/data/aclImdb.tgz

# Unzip the tgz file, and put it in the right directory:
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xvzf aclImdb.tgz -C data/

# Make sure the data is where we think it is:
!ls data/aclImdb

How to import data from Kaggle using the Kaggle CLI (lesson 2):
I found this forum post very useful.

# Install the Kaggle API
!pip3 install kaggle

# Import kaggle.json from Google Drive
# This snippet will output a link which needs authentication from any Google account
from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

# List the files for the Planet data 
!kaggle competitions files -c planet-understanding-the-amazon-from-space

# Download the data from Kaggle
# -c: competition name
# -f: which file you want to download
# -p: path to where the file should be saved
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f test-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv.zip -p ~/data/planet/

# In order to unzip the 7z files, need to install p7zip
# This was helpful: http://forums.fast.ai/t/unzipping-tar-7z-files-in-google-collab-notebook/14857/4
!apt-get install p7zip-full

# Unzip the 7zip files 
# -d: which file to un7zip
!p7zip -d ~/data/planet/test-jpg.tar.7z #-oc:/data/planet
!p7zip -d ~/data/planet/train-jpg.tar.7z #-oc:/data/planet

# Unzip the .tar files 
!tar -xvf ~/data/planet/test-jpg.tar
!tar -xvf ~/data/planet/train-jpg.tar

# Move the unzipped folders into data/planet/
!mv test-jpg ~/data/planet/ && mv train-jpg ~/data/planet/

# Unzip the regular file
!unzip ~/data/planet/train_v2.csv.zip -d ~/data/planet/

# Make sure everything looks as it should:
!ls ~/data/planet/

Finally, if you’re worried about how much of the GPU is available, there’s a cell you can run that checks the % utilization of your current GPU. See the Stack Overflow link that Sebastian posted earlier on in this thread.

Hope this helps some of you get started more quickly!