Colaboratory and Fastai

I have the following problem:

OSError: [Errno 5] Input/output error: “data/dogscats/tmp/x_act_resnet34_0_224.bc/data/__5.blp”

Every time I try to run, the number X __X.blp is different . If I open the directory, there are other __X.blp .

Any suggestions?

Thanks a lot.

I want to train upload my data into my current notebook. I tried putting it on filehosting sites and then tried downloading it viawget https://dl.dropboxusercontent.com/content_link/GjAxYko7UsfBU2Iio68DZ15qAOsWzVb20sKDyUh1yFfGFvQ1tR3EJeXfb2zio7Ia/file?dl=1 && unzip blackswhites.zip -d data
This doesn’t work. How can i get the data into my current environment?

It’s quit the opposite in my case

2 Likes

Can anyone share a way to use kaggle-cli [https://github.com/floydwch/kaggle-cli] to load data from directly within Colaboratory notebook?

Im doing the Dog Breeds assignment.

Ideally I’d like to directory transfer data from Kaggle to Colab, without going through my machine, as the uplink on my home internet is very slow and the files are taking long time to get to Colab.

Actually I found a solution to Kaggle -> Colab direct transfer.

I used the CurlWget extension Jeremy describes here [https://youtu.be/9C06ZPF8Uuc?t=744] and then invoke !wget ... from Colab to download into my Google Drive. Extremely fast transfer all within Google Cloud.

2 Likes

For getting started with " colaboratory and fastai " you can follow this blog as I have successfully done this.
[http://theailearner.com/2018/03/10/free-gpu-for-fast-ai-on-google-colab/]

2 Likes

Hello Marcus.

Did you remember to activate the GPU backend, e.g, changed the runtime from CPU to GPU?

Runtime -> Change Runtime Type

The Hardware accelerator must be GPU

To check it, run the code:

import tensorflow as tf
tf.test.gpu_device_name()

I know it sounds silly but it’s easy to forget.

Cheers,

Ronaldo

2 Likes

Hello Ben,

There is the API key to Kaggle which makes the access straightforward, as if you were on Kaggle Kernels:

install Kaggle API: !pip install kaggle

API Credentials

To use the Kaggle API, go to the ‘Account’ tab of your user profile (https://www.kaggle.com//account) and select ‘Create API Token’. This will trigger the download of kaggle.json, a file containing your API credentials.

Place this file on your Google Drive anywhere.

With the next snippet you download your credentials to Colab and you can start using Kaggle API:

from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

Then you can use commands such as !kaggle competitions list

The Kaggle API docs has the full list of commands to submit, download data etc.

I hope I’ve helped

Ronaldo

2 Likes

Thanks for the tip, but I’m using PyTorch and not Tensorflow.

But I am using the cuda options of PyTorch and the runtime is set to GPU.

for batch, (x, y) in enumerate(train_loader):
    x = torch.autograd.Variable(x).cuda()
    y = torch.autograd.Variable(y).cuda() 

I’m assuming its something specific to my dataset. It only has around 50 columns.

Hi everybody, I’m new to this forum.

I have already cloned the fast.ai repo into the Google Colab. But I don’t know how to open notebook in the repo, hope you guys will help me.

Hello intheroom,
In the top left corner, there is file menu from where you can select the option to “upload notebook”.
for more information go through this:
http://forums.fast.ai/t/colaboratory-and-fastai/10122/76

Hey guys,

Check this out:

This article lists down steps for setting up fast.ai course on Colab, but similar steps can be done for any ML project. http://bit.ly/2DyqMYd

This automates env setup, dataset and code download on Colab.

This actually solves the problem that the environment in Colab is not persistent ( i.e. every time you come back you must to install the libraries and download the datasets again )

Cheers

2 Likes

I don’t think you need to do so much. I’ve done lessons 1-3 on Colab and sure, many problems do occur in the process but once I figured it out for lesson 1, it wasn’t very hard. Besides, the troubleshooting helped me learn a lot.
It can be a bit intimidating for beginners and take up some extra time as well but it’s worth it.
Everyone on this thread is very helpful as well.

When I try pip install fastai on colab,

it always hang up or too much slow to download.
More precisely it always stops at

Collecting bcolz (from fastai)
Using cached bcolz-1.2.0.tar.gz

Other library installation such as pytorch or kaggle I have no problem

@yohei Try !pip install --upgrade git+https://github.com/fastai/fastai.git

3 Likes

Thank you so much

Hello,

Did you try running lesson1_rxt50.ipynb notebook in Google Colab? I’m getting the following error:

FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/lib/python3.6/dist-packages/fastai/weights/resnext_50_32x4d.pth’

I know that we need to download the weights.tgz file from http://files.fast.ai/models/weights.tgz but I don’t know exactly where to put it, as in paperspace/crestle/AWS we work with the notebook under the cloned repository folder, but in the case of Google Colab, the notebook is loaded from my Google Driver and not from the fastai local git repo.

Extra Google Colab Tips:

  1. As you know, every few hours, the kernel gets disconnected and local files removed. So you need to automatically backup the models during training to your Google Drive to save the progress and be able to re-use the model later. Here’s a script that helps us do that if you are using Keras: https://github.com/Zahlii/colab-tf-utils

  2. A repository of useful scripts for adding common services to non-persistent Colab VM sessions: https://github.com/mixuala/colab_utils

8 Likes

If you use Clouderizer (notebooks) with Google Colab (GPU), your notebooks + data are saved in your Clouderizer drive.

Thank you for sharing. This is the first time I heard about Clouderizer. Have you try using it? Is there a guide somewhere which we can refer to to get started with Clouderizer + Google Colab? I couldn’t find it in their docs and blog posts. Thank you in advanced.