Fast.ai with Google Colab

How did you save your work when you edited the fast ai notebooks, or if you created a new notebook from scratch? I find if I save my notebooks, stop the project, then come back later and start it all up again my work has all disappeared.

Hello Kyap,

Did you connect your Google Drive account to your Project at Clouderizer?

Everything in the folders:

  • fast.ai ( code )
  • out

Are saved in the Drive inside a folder named clouderizer

The clouderizer platform sync these folders every few minutes. Wait for at least 5 minutes before saving the notebook to close your clouderizer project, or just check your Google Drive to see if the file was updated.

2 Likes

I’ve been working through all the lessons in Colab, and haven’t set up Clouderizer.

Here’s my notebook explaining how to use Colab for Fastai (including importing data from various places, which is where I’ve had the most trouble so far!). I’ll keep adding cells for each lesson as I work through it.

The basics:

Always add a cell to the top of the notebook like this:

!pip3 install fastai

How to import data from fastai URLs (lessons 1, 3, 4):

For lesson 1:

# Get the Dogs & Cats data, unzip it, and put it in the 'data' directory:
!wget http://files.fast.ai/data/dogscats.zip && unzip dogscats.zip -d data/

# Check to make sure the folders all unzipped properly:
!ls data/dogscats

For lesson 3:

# Get the Rossmann data and make a directory to put it in:
!wget http://files.fast.ai/part2/lesson14/rossmann.tgz && mkdir -p ~/data/rossmann

# Unzip the .tgz file, and put it in the right directory:
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xzf rossmann.tgz -C ~/data/rossmann/

# Make sure the data is where we think it is:
!ls ~/data/rossmann

For lesson 4:

# Get the IMDB data:
!wget http://files.fast.ai/data/aclImdb.tgz

# Unzip the tgz file, and put it in the right directory:
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xvzf aclImdb.tgz -C data/

# Make sure the data is where we think it is:
!ls data/aclImdb

How to import data from Kaggle using the Kaggle CLI (lesson 2):
I found this forum post very useful.

# Install the Kaggle API
!pip3 install kaggle

# Import kaggle.json from Google Drive
# This snippet will output a link which needs authentication from any Google account
from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

# List the files for the Planet data 
!kaggle competitions files -c planet-understanding-the-amazon-from-space

# Download the data from Kaggle
# -c: competition name
# -f: which file you want to download
# -p: path to where the file should be saved
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f test-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv.zip -p ~/data/planet/

# In order to unzip the 7z files, need to install p7zip
# This was helpful: http://forums.fast.ai/t/unzipping-tar-7z-files-in-google-collab-notebook/14857/4
!apt-get install p7zip-full

# Unzip the 7zip files 
# -d: which file to un7zip
!p7zip -d ~/data/planet/test-jpg.tar.7z #-oc:/data/planet
!p7zip -d ~/data/planet/train-jpg.tar.7z #-oc:/data/planet

# Unzip the .tar files 
!tar -xvf ~/data/planet/test-jpg.tar
!tar -xvf ~/data/planet/train-jpg.tar

# Move the unzipped folders into data/planet/
!mv test-jpg ~/data/planet/ && mv train-jpg ~/data/planet/

# Unzip the regular file
!unzip ~/data/planet/train_v2.csv.zip -d ~/data/planet/

# Make sure everything looks as it should:
!ls ~/data/planet/

Finally, if you’re worried about how much of the GPU is available, there’s a cell you can run that checks the % utilization of your current GPU. See the Stack Overflow link that Sebastian posted earlier on in this thread.

Hope this helps some of you get started more quickly!

35 Likes

great and cool!

Hello Everyone and thank you for your helpful comments.
I am new to Fast.ai and I have been trying to find the best way to get started with the GPU acceleration. Since I have a gaming laptop with Nvidia GeForce GTX 1060, I always have the option of running the notebooks locally. But since it is more convenient to run on cloud, I thought if giving Google Colab a try. So far, I have been unsuccessful in getting the session to run with GPA and always get the error:

Failed to assign a backend
No backend with GPU available. Would you like to use a runtime with no accelerator?

My guess is that I’m late to the party and everyone is using the GPUs. If you think this is the case, do you suggest any other free way of cloud computing or should I install Ubuntu?

You can try Kaggle Kernels, they include free GPUs now

4 Likes

Hi William,

Thank you for the tip. Actually I just found that out right when you posted. Cheers!

1 Like

thank you for sharing this.

You can also import from Kaggle like this:

!pip install kaggle-cli

and then
!kg download -c dog-breed-identification -u yourusername -p password

2 Likes

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)
in colab this piece of code is taking about half an hour tell me how to do in less time?

I’m adding modified Google Colab compatible notebooks here: https://github.com/bhoomit/fastai-dl1-colab

2 Likes

Hi thanks but this doesnt work for it wont import fastai.transforms

thank you for this. the thing have to wait a few minutes to install the fastai + dataset.
I wonder once I set it up, can I just share the notebook and people does not have to download the fastai + dataset?

Hi Wiliam thank you for doing this. Do you see any advantage using Kaggle vs google colab for fastai?

also I wonder why do you need to modify the dataset and the way it was structure? can we just leave it as is?

Danny

I’m planning to attempt working through the course just in Colab as well. I was able to get the first lesson imports running in a colab notebook with a few minor changes, and without pip installing fastai.

I have the notebook set to Python 2 and using their GPU. I began by running the first 3 blocks of the official course nb (i.e. new cells in a fresh nb, using the first 3 cells from the official lesson1 nb)

Then in order to upload the models - I downloaded (or cloned) the course github locally, then used this block of upload code to bring in files to colab, so they can be imported in the nb (separate cells for each below, see *** note below before attempting this):

from google.colab import files
src = list(files.upload().values())[0]
open(‘vgg16.py’,‘wb’).write(src)
import vgg16

from google.colab import files
src = list(files.upload().values())[0]
open(‘vgg16bn.py’,‘wb’).write(src)
import vgg16bn

from google.colab import files
src = list(files.upload().values())[0]
open(‘utils.py’,‘wb’).write(src)
import utils

***Note - before doing this, Google colab uses the v2 of Keras, so I made some minor adjustments to the files prior to uploading.

In utils.py (on your machine) change line 42 to this: from keras.regularizers import l2, l1
and change line 45 to this: from keras.layers import deserialize as layer_from_config

in vgg16.py (on your machine) change line 177 in the fine tune function to this: self.ft(batches.num_classes)
and change line 213 in fit function to this:

self.model.fit_generator(batches, samples_per_epoch=batches.samples, nb_epoch=nb_epoch,
validation_data=val_batches, nb_val_samples=val_batches.samples)

OK, now you can go use those little upload script cells to bring the altered files in the notebook. Finally run this in a cell:

reload(utils)
from utils import plots

Doing that got the next section of the official notebook code to work for me (also change batch size to 4 instead of 64 when you first run it). I realize I could probably just import fastai with pip but I kinda like having it by hand with manually uploaded files so I know it will be reproducible, and then I know where the dependencies are causing issues when things like Keras change versions.

edit! also important to actually bring the zip data into your colab environment if you do it this way. can be achieved like this:

!wget http://files.fast.ai/data/dogscats.zip && unzip -qq dogscats.zip -d data/

you’ll then see the data in the colab file browser; also make sure to adjust path (from the official code) by running a cell with: path = “data/dogscats/”

Sorry for hijacking this thread as I am not able to find button to create new post.
I have been trying to use crestle to create an instance but its stuck for over an hour. I refreshed couple of times but it doesn’t move forward. Any idea how to solve this issue? crestle

as abdullah said this failed in importing fastai.transforms. (I ran !pip3 install “fastai<1”) and then the usual importing.

NameError                                 Traceback (most recent call last)
<ipython-input-8-94ce375e95cb> in <module>()
      1 from fastai.imports import *
----> 2 from fastai.transforms import *
      3 from fastai.conv_learner import *
      4 from fastai.model import *
      5 from fastai.dataset import *

/usr/local/lib/python3.6/dist-packages/fastai/transforms.py in <module>()
      3 from enum import IntEnum
      4 
----> 5 def scale_min(im, targ, interpolation=cv2.INTER_AREA):
      6     """ Scales the image so that the smallest axis is of size targ.

NameError: name 'cv2' is not defined

Hi @bhoomit,

Using your notebook, I am getting “module ‘torch’ has no attribute ‘float32’” error at
from fastai.transforms import *.

Any suggestions? Thanks.

I dont think you can really get it run faster on Colab. I guess Colab only give us access to the least amount of GPU. Maybe you can try with more GPU on Google Cloud Platform.

i have gotten part of lesson1 working on colab:

https://colab.research.google.com/drive/1wmMFcfvEQlu31YBQ3IoEQn-Rsylscrkj

as of 10-12-18 :slight_smile:

used alot of this code:
https://colab.research.google.com/github/corykendrick/fastai_in_colab/blob/master/Using_Google_Colab_for_Fastai.ipynb#scrollTo=aBlnbQ-wJNCY

i do have some type-errors being thrown, if anyone can suggest a fix. tia

update: found this linked on wiki and wish i has seen it two weeks ago. first heard of fast.ai from announcement of new version-1 … on twitter… my mistake for not reading up first


only use conda for courses and not pip …

really enjoying the videos… thanks

2 Likes