Colaboratory and Fastai

Maybe consider using clouderizer? I’ve got it working within minutes.

1 Like

Third option would be to use Clouderizer Drive. In case data folder is not visible on Drive, we can always create one under our project. Anything we upload to data folder, gets pushed to the machine running Clouderizer project within 2 minutes. This article describes the 2-way sync of Clouderizer folders with Drive.

Also, specifically for Kaggle Datasets, there is one very nice integration with Kaggle API. We just need to specify the competition name in our project setting and Clouderizer will automatically download the dataset from Kaggle site.
Here is a detailed article around this feature.

2 Likes

Hi @prakashgupta, what is the size of the Clouderizer Drive for a free account ?

As much as what is available in the linked Google Drive. Clouderizer itself does not put any limits.

Hi Prakash. Thank you for the service.

I know this is not the right place for Clouderizer’s support. I have 2 questions.

The automatic 2 way data sync to Google Drive is a life-saver. However, during my testing, I noticed there’s a slight ~30s delay (can’t confirm). Is this suppose to be so? Why is it not real-time? I need to know this so I can plan to run script to save my model training state (data) and prevent loss.

The second issue is, when I create a new Jupyter notebook (default filename is ‘Untitled.ipynb’) and then rename it to something else, it created an orphan file Untitled.ipynb in Google Drive.

Thanks.

OK, finally I managed to tweak the first lesson to complete on Colab, see the notes and notebooks here:

2-way sync is actually triggered every 2 minutes. So it will take anywhere from 0 -2 mins for any updates in your code and output folder to show up on Google Drive. Similarly anything uploaded on data folder, will get downloaded on machine within 2 mins. This workflow is detailed in help article in my above reply. Sync is generally an expensive operation (especially when dealing with folders containing large number of files) and hence we chose not to sync too often.
Let me know the issue this delay of upto 2 minutes is causing for you. We are open to align to what community needs.

I’m afraid :slight_smile:, second issue is also by design, Sync back from machine to Google Drive only modifies and adds new files (not deletes). This is to make sure any accidental delete on your machine, doesn’t delete the data on cloud. Again, I am open to your feedback and suggestions on this.

1 Like

Hi manikanta_sManikanta,

On Google Colab when I try to run these fastai imports

from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

they fail to install with this error:

ModuleNotFoundError: No module named ‘fastai’

How can I install the fastai imports?

Thanks

@prakashgupta…as it is almost not practical to uppload large files in Google Drive… I tried downloading the data directly in clouderizer terminal using curlwget. It succeeded but couldnot unzip. tried installing unzip using sudo but didn’t work out. Do you know a way to unzip from clouderizer terminal.
Thanks

Don’t use sudo in Google Colab. Try installing without sudo
apt install unzip

You can also specify unzip package in Clouderizer project itself. It will auto install it on project start. Have a look at this doc

You can also put your curl/wget download command followed by unzip command in Clouderizer project under startup script (Workspace Tab).

3 Likes

Thanks. That worked for me.

If you download data set in clouderizer terminal… is it temperory because i find it is gone next time i restart Clouderizer. It is not in data folder while i could use in prevous session. Do i need to download the data each session. Thanks

1 Like

Hey,

were you able to fix this problem?

data folder is synced down (Google Drive -> Machine)
code and out folder is synced up (Machine -> Google Drive)

Please see the last section of the help article above for details.

In your case, you are directly downloading to data folder on machine. This will not get persisted to Google Drive. This is by design.

In case you wish to persist this data, try downloading this in code folder. It should then sync to Google Drive.
Don’t forget to update dataset path in your notebook to accomodate this path change.

2 Likes

Thanks. That helped me a lot.

1 Like

Simply change the working directory.
import os
os.chdir(“drive/app”)

Now try to download the image folder.

I met this problem several times and today I found a solution. just check how many free GPU RAM do you have (in my case it was 0% free left), if there is few memory left, clean your GPU RAM up and then reconnect. You will find that you have your GPU RAM 100% free!!!

To check your GPU memory

!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " I Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print('GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB'.format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

To clean up and cut the connection

!kill -9 -1

Hope that will help

Hey, from what I understood, everytime you start a notebook on colab all that setup and installation process has to happen. Which means just to start working on it you spent around an hour on just installation process. Please clarify me if I’m wrong.

Hey Shubham, it’s not that bad. From my experience setting up the environment takes up to 10 minutes, can be almost fully automated and needs to be done roughly once per day (since virtual machine can live up to 12 hours).

Google Colab is a fantastic resource. It just takes time to get used to it. For studying (and some development) I’d say it’s more convenient than AWS.

The biggest drawback is limited RAM (12GB) so you need to be aware of your notebook’s memory consumption (and mitigate it) at all times.

1 Like

Hey, can someone help me on how to set or update the env variables PATH in google colab with clouderizer ?