Colaboratory and Fastai

2-way sync is actually triggered every 2 minutes. So it will take anywhere from 0 -2 mins for any updates in your code and output folder to show up on Google Drive. Similarly anything uploaded on data folder, will get downloaded on machine within 2 mins. This workflow is detailed in help article in my above reply. Sync is generally an expensive operation (especially when dealing with folders containing large number of files) and hence we chose not to sync too often.
Let me know the issue this delay of upto 2 minutes is causing for you. We are open to align to what community needs.

Iā€™m afraid :slight_smile:, second issue is also by design, Sync back from machine to Google Drive only modifies and adds new files (not deletes). This is to make sure any accidental delete on your machine, doesnā€™t delete the data on cloud. Again, I am open to your feedback and suggestions on this.

1 Like

Hi manikanta_sManikanta,

On Google Colab when I try to run these fastai imports

from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

they fail to install with this error:

ModuleNotFoundError: No module named ā€˜fastaiā€™

How can I install the fastai imports?

Thanks

@prakashguptaā€¦as it is almost not practical to uppload large files in Google Driveā€¦ I tried downloading the data directly in clouderizer terminal using curlwget. It succeeded but couldnot unzip. tried installing unzip using sudo but didnā€™t work out. Do you know a way to unzip from clouderizer terminal.
Thanks

Donā€™t use sudo in Google Colab. Try installing without sudo
apt install unzip

You can also specify unzip package in Clouderizer project itself. It will auto install it on project start. Have a look at this doc

You can also put your curl/wget download command followed by unzip command in Clouderizer project under startup script (Workspace Tab).

3 Likes

Thanks. That worked for me.

If you download data set in clouderizer terminalā€¦ is it temperory because i find it is gone next time i restart Clouderizer. It is not in data folder while i could use in prevous session. Do i need to download the data each session. Thanks

1 Like

Hey,

were you able to fix this problem?

data folder is synced down (Google Drive -> Machine)
code and out folder is synced up (Machine -> Google Drive)

Please see the last section of the help article above for details.

In your case, you are directly downloading to data folder on machine. This will not get persisted to Google Drive. This is by design.

In case you wish to persist this data, try downloading this in code folder. It should then sync to Google Drive.
Donā€™t forget to update dataset path in your notebook to accomodate this path change.

2 Likes

Thanks. That helped me a lot.

1 Like

Simply change the working directory.
import os
os.chdir(ā€œdrive/appā€)

Now try to download the image folder.

I met this problem several times and today I found a solution. just check how many free GPU RAM do you have (in my case it was 0% free left), if there is few memory left, clean your GPU RAM up and then reconnect. You will find that you have your GPU RAM 100% free!!!

To check your GPU memory

!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isnā€™t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " I Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print('GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB'.format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

To clean up and cut the connection

!kill -9 -1

Hope that will help

Hey, from what I understood, everytime you start a notebook on colab all that setup and installation process has to happen. Which means just to start working on it you spent around an hour on just installation process. Please clarify me if Iā€™m wrong.

Hey Shubham, itā€™s not that bad. From my experience setting up the environment takes up to 10 minutes, can be almost fully automated and needs to be done roughly once per day (since virtual machine can live up to 12 hours).

Google Colab is a fantastic resource. It just takes time to get used to it. For studying (and some development) Iā€™d say itā€™s more convenient than AWS.

The biggest drawback is limited RAM (12GB) so you need to be aware of your notebookā€™s memory consumption (and mitigate it) at all times.

1 Like

Hey, can someone help me on how to set or update the env variables PATH in google colab with clouderizer ?

Simple use before any code in your notebook:

!pip install git+https://github.com/fastai/fastai.git

So is this all we have to do to get Colab to work? Just run this command before every lesson?

Well, first you have to visualize your Colab Notebook as a cloud service. Automatically they have some packages installed, like Jupyter Notebook, but not something specific like fast.ai library.

Based on that you can use the !{command-line string} to run some code directly in the bash using only your notebook. If you need some data you can download using a command line.

Example: The fisrt lesson notebook need fast.ai library installled and the data folder dogscats. To install everything correct I use before any code this sequences of commands:

!pip install git+https://github.com/fastai/fastai.git
!wget http://files.fast.ai/data/dogscats.zip
!unzip dogscats.zip
!mkdir data
!mv dogscats data

This command lines make the notebook run smooth.

2 Likes

@prakashgupta, I donā€™t quite understand the use of Clouderizer. I understand that it provides scripts which are imported to Google colab to create the environment required for the course but what does it happen with the three folders and the sync to Drive? I donā€™t understand if, say, I download the dataset for the first lesson, will it be saved to my Drive account or to Clouderizer or none of them?. Thank you for your help in the forum, I read your previous posts but I couldnā€™t understand them completely.

@mizar

Here is an article which explains how code, data and out folders work in Clouderizer projects

Hope this clarifies.

-Prakash

Iā€™m adding modified Google Colab compatible notebooks here: https://github.com/bhoomit/fastai-dl1-colab