Colaboratory and Fastai

atulkrishna · March 17, 2018, 9:12am

Hello intheroom,
In the top left corner, there is file menu from where you can select the option to “upload notebook”.
for more information go through this:
http://forums.fast.ai/t/colaboratory-and-fastai/10122/76

ronaldokun · March 18, 2018, 12:19am

Hey guys,

Check this out:

This article lists down steps for setting up fast.ai course on Colab, but similar steps can be done for any ML project. http://bit.ly/2DyqMYd

This automates env setup, dataset and code download on Colab.

This actually solves the problem that the environment in Colab is not persistent ( i.e. every time you come back you must to install the libraries and download the datasets again )

Cheers

keratin · March 18, 2018, 3:47am

I don’t think you need to do so much. I’ve done lessons 1-3 on Colab and sure, many problems do occur in the process but once I figured it out for lesson 1, it wasn’t very hard. Besides, the troubleshooting helped me learn a lot.
It can be a bit intimidating for beginners and take up some extra time as well but it’s worth it.
Everyone on this thread is very helpful as well.

yohei · March 27, 2018, 7:39am

When I try pip install fastai on colab,

it always hang up or too much slow to download.
More precisely it always stops at

Collecting bcolz (from fastai)
Using cached bcolz-1.2.0.tar.gz

Other library installation such as pytorch or kaggle I have no problem

keratin · March 27, 2018, 8:07am

@yohei Try !pip install --upgrade git+https://github.com/fastai/fastai.git

yohei · March 27, 2018, 8:34am

Thank you so much

santteegt · April 2, 2018, 3:52am

Hello,

Did you try running lesson1_rxt50.ipynb notebook in Google Colab? I’m getting the following error:

FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/lib/python3.6/dist-packages/fastai/weights/resnext_50_32x4d.pth’

I know that we need to download the weights.tgz file from http://files.fast.ai/models/weights.tgz but I don’t know exactly where to put it, as in paperspace/crestle/AWS we work with the notebook under the cloned repository folder, but in the case of Google Colab, the notebook is loaded from my Google Driver and not from the fastai local git repo.

cedric · April 6, 2018, 6:23pm

Extra Google Colab Tips:

As you know, every few hours, the kernel gets disconnected and local files removed. So you need to automatically backup the models during training to your Google Drive to save the progress and be able to re-use the model later. Here’s a script that helps us do that if you are using Keras: https://github.com/Zahlii/colab-tf-utils
A repository of useful scripts for adding common services to non-persistent Colab VM sessions: https://github.com/mixuala/colab_utils

pierreguillou · April 7, 2018, 9:27pm

If you use Clouderizer (notebooks) with Google Colab (GPU), your notebooks + data are saved in your Clouderizer drive.

cedric · April 8, 2018, 4:39pm

Thank you for sharing. This is the first time I heard about Clouderizer. Have you try using it? Is there a guide somewhere which we can refer to to get started with Clouderizer + Google Colab? I couldn’t find it in their docs and blog posts. Thank you in advanced.

pierreguillou · April 8, 2018, 11:12pm

Hi @cedric,

the use of Clouderizer helps a lot for running (for free) the Fastai notebooks using Google Colab GPU.

Best blog post : Fastest way to setup Fast.ai course notebooks , for free — using Google Colab and Clouderizer from @prakashgupta

… and “FastAI with Clouderizer – Get started in minutes”.

Have fun

Blanche · April 15, 2018, 3:23pm

How can I upload custom data set using this setup (like additional set, so I won’t have to build it twice, I’ve figured that I can provide link to data in the project settings)? I’ve tried to look for a data folder in fast.ai clouderizer drive, but I don’t see any. Also when I’m trying to log into the console I get the permission denied (using either empty password and my clouderizer password), so I can’t look it up either.

pierreguillou · April 15, 2018, 7:22pm

There are at least 2 possibilities I think (cc @prakashgupta):

Open a Terminal window from the leaderbord of your project in Clouderizer, use the clouderizer (with small c) password and use for example the wget command to download your dataset file.
Open a jupyter notebook in your Clouderizer and use !wget (with exclamation point in front of wget) to download your dataset file.

If it does not work, you can ask in the Clouderizer forum as well.

Celia · April 27, 2018, 4:28pm

I can not run the courses’ notebooks on Colab at all!!! I don’t known why. For the lesson1, after executing
!pip install fastai, when I execute ‘from fastai.imports import *’, there are a lot of errors such as:
module ‘scipy’ has no attribute ‘sparse’

I mean I didn’t do anything. I just want to load all libraries to run the notebook.

So one has met the same problem?

Blanche · April 28, 2018, 11:05am

Maybe consider using clouderizer? I’ve got it working within minutes.

prakashgupta · May 2, 2018, 7:22pm

Third option would be to use Clouderizer Drive. In case data folder is not visible on Drive, we can always create one under our project. Anything we upload to data folder, gets pushed to the machine running Clouderizer project within 2 minutes. This article describes the 2-way sync of Clouderizer folders with Drive.

Also, specifically for Kaggle Datasets, there is one very nice integration with Kaggle API. We just need to specify the competition name in our project setting and Clouderizer will automatically download the dataset from Kaggle site.
Here is a detailed article around this feature.

https://medium.com/@prakash_31206/kaggle-on-google-colab-easiest-way-to-transfer-datasets-and-remote-bash-e54c64054faa

pierreguillou · May 4, 2018, 9:32pm

Hi @prakashgupta, what is the size of the Clouderizer Drive for a free account ?

prakashgupta · May 5, 2018, 2:23am

As much as what is available in the linked Google Drive. Clouderizer itself does not put any limits.

cedric · May 8, 2018, 11:07am

Hi Prakash. Thank you for the service.

I know this is not the right place for Clouderizer’s support. I have 2 questions.

The automatic 2 way data sync to Google Drive is a life-saver. However, during my testing, I noticed there’s a slight ~30s delay (can’t confirm). Is this suppose to be so? Why is it not real-time? I need to know this so I can plan to run script to save my model training state (data) and prevent loss.

The second issue is, when I create a new Jupyter notebook (default filename is ‘Untitled.ipynb’) and then rename it to something else, it created an orphan file Untitled.ipynb in Google Drive.

Thanks.

stas · May 8, 2018, 5:31pm

OK, finally I managed to tweak the first lesson to complete on Colab, see the notes and notebooks here: